Friday, July 27, 2018

awesome scalability

awesome scalability


Contents

  • Principles
  • Scalability
  • Availability
  • Stability
  • Performance
  • Intelligence
  • Architectures
  • Ad-hoc
  • Interview
  • Talks
  • Books

Principles

  • Designs, Lessons and Advice from Building Large Distributed Systems - Jeff Dean
  • On Efficiency, Reliability, Scaling - James Hamilton, VP at AWS
  • Principles of Chaos Engineering
  • Finding the Order in Chaos
  • The Twelve-Factor App
  • Clean Architecture
  • High Cohesion and Low Coupling
  • CAP Theorem and Trade-offs
  • CP Databases and AP Databases
  • Scale Up vs Scale Out
  • Scale Up vs Scale Out: Hidden Costs
  • Best Practices for Scaling Out
  • ACID and BASE
  • Blocking/Non-Blocking and Sync/Async
  • Performance and Scalability of Databases
  • Database Isolation Levels and Effects on Performance and Scalability
  • SQL vs NoSQL
  • SQL vs NoSQL - Lesson Learned from Salesforce
  • How Sharding Works
  • Consistent Hashing
  • Uniform Consistent Hashing (used at Netflix)
  • Eventually Consistent - Werner Vogels, CTO at Amazon
  • Cache is King
  • Anti-Caching
  • Understand Latency
  • Latency Numbers Every Programmer Should Know
  • Architecture Issues When Scaling Web Applications: Bottlenecks, Database, CPU, IO
  • Common Bottlenecks
  • Life Beyond Distributed Transactions
  • Relying on Software to Redirect Traffic Reliably at Various Layers
  • Breaking Things on Purpose
  • Avoid Over Engineering
  • Scalability Worst Practices
  • Use Solid Technologies - Don�t Re-invent the Wheel - Keep It Simple!
  • Why Over-Reusing is Bad
  • Performance is a Feature
  • Make Performance Part of Your Workflow
  • The Benefits of Server Side Rendering Over Client Side Rendering
  • Writing Code that Scales
  • Automate and Abstract: Lessons from Facebook on Engineering for Scale
  • AWS Dos and Donts
  • (UI) Design Doesn�t Scale - Stanley Wood, Design Director at Spotify
  • Linux Performance
  • How To Design A Good API and Why it Matters - Joshua Bloch
  • Building Fast & Resilient Web Applications - Ilya Grigorik
  • Design for Loose-coupling
  • Design for Resiliency
  • Design for Self-healing
  • Design for Scaling Out
  • Design for Evolution
  • Learn from Mistakes
  • Code Review Best Practices at Palantir

Scalability

  • Microservices and Orchestration
    • Microservices Resource Guide - Martin Fowler, Chief Scientist at ThoughtWorks
    • Microservices Patterns
    • Advantages and Drawbacks of Microservices
    • Microservices Scale Cube
    • Thinking Inside the Container (8 parts) at Riot Games
    • Containerization at Pinterest
    • Techniques for Splitting Up a Codebase into Microservices and Artifacts at LinkedIn
    • The Evolution of Container Usage at Netflix
    • Dockerizing MySQL at Uber
    • Testing of Microservices at Spotify
    • Organize Monolith Before Breaking it into Services at Weebly
    • Lessons learned running Docker in production at Treehouse
    • Inside a SoundCloud Microservice
    • Microservices at BlaBlaCar
    • Operate Kubernetes Reliably at Stripe
    • Kubernetes Traffic Routing (2 parts) at Rakuten
    • Agrarian-Scale Kubernetes (3 parts) at New York Times
    • Mesos, Docker and Ochopod in Localization Services at Autodesk
    • Nanoservices at BBC Online
    • PowerfulSeal: Testing Tool for Kubernetes Clusters at Bloomberg
    • Conductor: Microservices Orchestrator at Netflix
    • Making 10x Improvement in Release Times with Docker and Amazon ECS at Nextdoor
    • K8Guard: Auditing System for Kubernetes Clusters at Target.com
  • Distributed Caching
    • Read-Through, Write-Through, Write-Behind, and Refresh-Ahead Caching
    • Eviction Policy and Expiration Policy
    • EVCache: Caching for a Global Netflix
    • Memsniff: Robust Memcache Traffic Analyzer at Box
    • Caching with Consistent Hashing and Cache Smearing at Etsy
    • Analysis of Photo Caching at Facebook
    • Cache Efficiency Exercise at Facebook
    • tCache: Scalable Data-aware Java Caching at Trivago
    • Reduce Memcached Memory Usage by 50% at Trivago
    • Caching Internal Service Calls at Yelp
  • Distributed Tracking and Tracing
    • Tracking Service Infrastructure at Scale at Shopify
    • Distributed Tracing with Pintrace at Pinterest
    • Distributed Tracing at HelloFresh
    • Analyzing Distributed Trace Data at Pinterest
    • Distributed Tracing at Uber
    • Data Checking at Dropbox
    • Tracing Distributed Systems at Showmax
    • Real-time Distributed Tracing at LinkedIn
    • Zipkin: Distributed Systems Tracing at Twitter
    • osquery Across the Enterprise at Palantir
  • Distributed Logging
    • The Problem with Logging - Jeff Atwood
    • The Log: What Every Software Engineer Should Know
    • Using Logs to Build a Solid Data Infrastructure - Martin Kleppmann
    • Scalable and Reliable Log Ingestion at Pinterest
    • Building DistributedLog at Twitter: High-performance replicated log service
    • Logging Service with Spark at CERN Accelerator
    • Logging and Aggregation at Quora
    • BookKeeper: Distributed Log Storage at Yahoo
    • LogDevice: Distributed Data Store for Logs at Facebook
    • LogFeeder: Log Collection System at Yelp
  • Distributed Security
    • Approach to Security at Scale at Dropbox
    • Aardvark and Repokid: AWS Least Privilege for Distributed, High-Velocity Development at Netflix
    • LISA: Distributed Firewall at LinkedIn
    • Distributed Security Alerting at Slack
    • Secure Infrastructure To Store Bitcoin In The Cloud at Coinbase
  • Distributed Messaging and Event Streaming
    • When to use RabbitMQ or Kafka
    • Should You Put Several Event Types in the Same Kafka Topic? - Martin Kleppmann
    • Kafka at Scale at Linkedin
    • Delaying Asynchronous Message Processing with RabbitMQ at Indeed
    • Real-time Data Pipeline with Kafka at Yelp
    • Building Reliable Reprocessing and Dead Letter Queues with Kafka at Uber
    • Audit Kafka End-to-End at Uber (count each message exactly once, audit a message across tiers)
    • Kafka for PaaS at Rakuten
    • Publishing with Kafka at The New York Times
    • Kafka Streams on Heroku
    • Kafka in Platform Events Architecture at Salesforce
    • Bullet: Forward-Looking Query Engine for Streaming Data at Yahoo
    • Benchmarking Streaming Computation Engines at Yahoo
    • Messaging Service at Riot Games
    • Event Stream Analytics with Druid (Search Engine meet Column DB) at Walmart
    • Deduplication Techniques
      • Exactly-once Semantics are Possible: Here�s How Kafka Does it
      • Real-time Deduping at Scale with Kafka-based Pipleline at Tapjoy
      • Delivering Billions of Messages Exactly Once: Deduping at Segment
      • Deduplication For Efficient Storage (From 50 PB To 32 PB) At Mail.Ru
  • Distributed Searching
    • Search Architecture of Instagram
    • Search Architecture of eBay
    • Improving Search Engine Efficiency by over 25% at eBay
    • Search Federation Architecture at LinkedIn (2018)
    • Search at Slack
    • Search and Recommendations at DoorDash
    • Search Service at Twitter (2014)
    • Nautilus: Travel Search Engine of Expedia
    • Galene: Search Architecture of LinkedIn
    • Manas: High Performing Customized Search System at Pinterest
    • Sherlock: Near Real Time Search Indexing at Flipkart
    • Nebula: Storage Platform to Build Search Backends at Airbnb
    • ELK (Elasticsearch, Logstash, Kibana) Stack
      • Elasticsearch Performance Tuning Practice at eBay
      • Elasticsearch at Kickstarter
      • Distributed Troubleshooting Platform with ELK Stack at Target.com
      • ELK at Robinhood
  • Distributed Storage
    • In-memory Storage
      • Introduction to In-memory Data - Viktor Gamov, Solutions Architect at Hazelcast
      • MemSQL Architecture - The Fast (MVCC, InMem, LockFree, CodeGen) And Familiar (SQL)
      • Optimizing Memcached Efficiency at Quora
      • Real-Time Data Warehouse with MemSQL on Cisco UCS
      • Moving to MemSQL (with Horizontally Scalable, ACID Compliant, MySQL Compatibility) at Tapjoy
    • Durable Storage (Amazon S3)
      • Reasons for Choosing S3 over HDFS at Databricks
      • S3 in the Data Infrastructure at Airbnb
      • Quantcast File System on Amazon S3
      • Using S3 in Netflix Chukwa
      • Yahoo Cloud Object Store - Object Storage at Exabyte Scale
      • Ambry: Distributed Immutable Object Store at LinkedIn
      • Hammerspace: Persistent, Concurrent, Off-heap Storage at Airbnb
  • Relational Databases (MySQL, MSSQL, PostgreSQL)
    • Microsoft SQL versus MySQL
    • SQL Database Performance Tuning
    • Scaling PostgreSQL Using CUDA
    • Scaling Distributed Joins
    • MySQL System Design at Booking.com
    • MySQL Parallel Replication (4 parts) at Booking.com
    • Partitioning Main MySQL Database at Airbnb
    • PostgreSQL at Twitch
    • Scaling MySQL-based Financial Reporting System at Airbnb
    • Scaling MySQL at Wix
    • Switching from Postgres to MySQL at Uber
    • Handling Growth with Postgres at Instagram
    • Scaling the Analytics Database (Postgres) at TransferWise
    • Updating a 50 Terabyte PostgreSQL Database at Adyen
    • Sharding (Horizontal Partitioning)
      • Sharding MySQL at Pinterest
      • Sharding MySQL at MailChimp
      • Sharding MySQL (3 parts) at Evernote
  • NoSQL Databases
    • Key-Value Databases (DynamoDB, Voldemort, Manhattan)
      • Scaling Mapbox infrastructure with DynamoDB Streams
      • Manhattan: Twitter�s distributed key-value database
      • Sherpa: Yahoo�s distributed NoSQL key-value store
      • Riak inside Chat Service Architecture at Riot Games
      • MPH: Fast and Compact Immutable Key-Value Stores at Indeed
      • zBase: High Performance, Elastic, Distributed Key-Value Store at Zynga
    • Column Databases (Cassandra, HBase)
      • Consistent Hashing in Cassandra
      • Understanding Gossip (Cassandra Internals)
      • When NOT to use Cassandra?
      • Avoid Pitfalls in Scaling Cassandra Cluster at Walmart
      • Storing Images in Cassandra at Walmart
      • Cassandra at Instagram
      • Scale Ad Analytics with Cassandra at Yelp
      • Store Billions of Messages with Cassandra at Discord
      • Scale to 100+ Million Reads/Writes using Spark and Cassandra at Dream11
      • Moving Food Feed from Redis to Cassandra at Zomato
      • Benchmarking Cassandra Scalability on AWS at Netflix
      • Imgur Notification: From MySQL to HBASE at Imgur
      • Improving HBase Backup Efficiency at Pinterest
      • ClickHouse - Open Source Distributed Column Database at Yandex
    • Document Databases (MongoDB, SimpleDB, CouchDB)
      • eBay: Building Mission-Critical Multi-Data Center Applications with MongoDB
      • MongoDB at Baidu: Multi-Tenant Cluster Storing 200+ Billion Documents across 160 Shards
      • The AWS and MongoDB Infrastructure of Parse (acquired by Facebook)
      • Migrating Mountains of Mongo Data at Addepar
      • Couchbase Ecosystem at LinkedIn
      • SimpleDB at Zendesk
    • Graph Databases
      • Handling Billions of Edges in a Graph Database
      • Neo4j case studies with Walmart, eBay, AirBnB, NASA, etc
      • FlockDB: Distributed Graph Database for Storing Adjancency Lists at Twitter
      • JanusGraph: Scalable Graph Database backed by Google, IBM and Hortonworks
      • Amazon Neptune
    • Datastructure Databases (Redis, Hazelcast)
      • Using Redis To Scale at Twitter
      • Scaling Job Queue with Redis at Slack
      • Moving persistent data out of Redis at Github
      • Storing Hundreds of Millions of Simple Key-Value Pairs in Redis at Instagram
      • Redis in Chat Architecture of Twitch (from 27:22)
      • Learn Redis the hard way (in production) at Trivago
      • Optimizing Session Key Storage in Redis at Deliveroo
      • Optimizing Redis Storage at Deliveroo
  • Time Series Database (TSDB)
    • What is Time-Series Data & Why We Need a Time-Series Database
    • Time Series Data: Why and How to Use a Relational Database instead of NoSQL
    • Beringei: High-performance Time Series Storage Engine at Facebook
    • Atlas: In-memory Dimensional Time Series Database at Netflix
    • Heroic: Time Series Database at Spotify
    • Roshi: Distributed Storage System for Time-Series Event at SoundCloud
    • Building a Scalable Time Series Database on PostgreSQL
    • Scaling Time Series Data Storage at Netflix
  • HTTP Caching (Reverse Proxy, CDN)
    • Reverse Proxy (Nginx, Varnish, Squid, rack-cache)
    • Stop Worrying and Love the Proxy
    • Playing HTTP Tricks with Nginx
    • Using CDN to Improve Site Performance at Coursera
    • Strategy: Caching 404s Saved 66% On Server Time at The Onion
    • Increasing Application Performance with HTTP Cache Headers
    • Zynga Geo Proxy: Reducing Mobile Game Latency at Zynga
    • Google AMP at Cond� Nast
    • Running A/B Tests on Hosting Infrastructure (CDNs) at Deliveroo
    • HAProxy with Kubernetes for User-facing Traffic at SoundCloud
    • Bandaid: Service Proxy at Dropbox
  • Load Balancing and Other Network Matters

No comments:

Post a Comment