System design is the practice of making decisions about how components of a software system fit together to meet requirements for scale, reliability, and performance. Every architectural decision involves trade-offs, and understanding those trade-offs is more valuable than memorizing specific solutions.
Why trade-offs dominate everything
There is no universally optimal system architecture. A design that handles 100 million users per day poorly serves a team of five people working on a startup. A system optimized for consistency in financial transactions is wrong for a social media feed.
Good system design starts with understanding the actual requirements: what are the read/write patterns, what are the latency requirements, how much data is involved, what is the acceptable failure mode?
Scalability
A system is scalable if it can handle increased load without proportional increases in cost or complexity. Two main approaches:
Vertical scaling (scale up): Use a more powerful machine — more CPU, more RAM. Simple but has hard limits and a single point of failure.
Horizontal scaling (scale out): Add more machines. More complex (requires distributed coordination) but can scale arbitrarily.
For most high-load services, the goal is stateless application servers that can be horizontally scaled behind a load balancer, with state (user sessions, data) stored in a separate tier that can be scaled independently.
The CAP theorem
The CAP theorem states that a distributed system can guarantee at most two of three properties simultaneously:
- Consistency: Every read returns the most recent write
- Availability: Every request receives a response (not necessarily the most recent)
- Partition tolerance: The system continues to operate despite network partitions
Since network partitions are inevitable in distributed systems, the real choice is between consistency and availability during a partition. Most systems choose one as the priority and accept degraded behavior in the other during failures.
Databases: choosing the right tool
Relational databases (PostgreSQL, MySQL) provide ACID guarantees, support complex queries with SQL, and are the right default for most applications. Do not use a NoSQL database just because it sounds modern.
Document databases (MongoDB, Firestore) work well when your data is naturally document-shaped and you do not need complex joins.
Key-value stores (Redis, DynamoDB) provide fast reads and writes when your access pattern is simple: look up a value by key. Redis is also widely used for caching.
Time-series databases (InfluxDB, TimescaleDB) are optimized for data indexed by timestamp — metrics, sensor data, financial data.
Column-family stores (Cassandra, HBase) work well for write-heavy workloads with high availability requirements and acceptable eventual consistency.
Caching
Caching stores frequently accessed data in a fast-access layer (typically in memory) to reduce latency and load on slower storage.
Common caching strategies:
Cache-aside: The application checks the cache first; on a miss, fetches from the database and writes to cache. Simple and works well for read-heavy workloads.
Write-through: Write to the cache and database simultaneously. Ensures consistency but adds write latency.
Write-behind: Write to the cache immediately and to the database asynchronously. Lower write latency but risk of data loss if the cache fails before the write completes.
Cache invalidation — keeping the cache consistent with the source of truth — is one of the genuinely hard problems in system design. Simple TTL-based expiration is often good enough.
Load balancing
Load balancers distribute incoming requests across multiple backend servers. Strategies include:
- Round robin: Distribute requests in order
- Least connections: Send to the server with fewest active connections
- IP hash: Route requests from the same client to the same server (useful for session affinity)
Modern load balancers (AWS ALB, Nginx, HAProxy) provide health checks, SSL termination, and request routing.
Asynchronous processing with queues
Not everything needs to happen synchronously in the request/response cycle. Tasks like sending emails, processing uploaded files, or running complex calculations can be pushed to a queue and processed by workers asynchronously.
Message queues (RabbitMQ, SQS, Kafka) decouple producers from consumers, absorb traffic spikes, and allow independent scaling of the processing layer. The trade-off is added complexity and eventual consistency — the result is not immediately available.
Replication and partitioning
Replication: Maintain copies of data on multiple nodes. Improves availability and read throughput. Creates consistency challenges when writes need to propagate.
Partitioning (sharding): Split data across multiple nodes, each responsible for a subset. Enables horizontal scaling of storage and write throughput. Requires careful choice of partition key and makes cross-partition queries expensive.
Most large-scale databases support both: data is partitioned across shards, and each shard is replicated.
Observability
A system you cannot observe is a system you cannot operate. The three pillars of observability:
Logs: Structured, timestamped records of events. Essential for debugging but need to be queryable.
Metrics: Numerical measurements over time. Latency percentiles, error rates, throughput. Drive dashboards and alerts.
Traces: End-to-end records of a request's path through the system. Essential for debugging latency in distributed systems.
Summary
System design is about trade-offs, not solutions. Understand your requirements before choosing tools. Scale horizontally where possible. Choose consistency or availability based on your failure tolerance. Use caches to reduce latency and database load. Queue asynchronous work. Instrument everything. The right architecture is the simplest one that meets your actual requirements.