20. Back-of-the-Envelope Estimation
Back-of-the-envelope calculations are rough estimates used to evaluate system design feasibility. They help you quickly determine whether a proposed architecture can handle the expected load, how much storage is needed, and what resources are required.
Why Estimation Matters
- Validate feasibility: "Can a single server handle this load?"
- Capacity planning: "How many servers / how much storage do we need?"
- Design decisions: "Do we need sharding, caching, CDN?"
- Interview signal: Demonstrates ability to think quantitatively about systems.
Powers of 2 Reference
| Power | Exact Value | Approximate |
|---|---|---|
| 1,024 | ~1 thousand (1 KB) | |
| 1,048,576 | ~1 million (1 MB) | |
| 1,073,741,824 | ~1 billion (1 GB) | |
| 1,099,511,627,776 | ~1 trillion (1 TB) | |
| ~1 PB (petabyte) |
Latency Numbers Every Engineer Should Know
| Operation | Latency | Notes |
|---|---|---|
| L1 cache reference | 0.5 ns | |
| Branch mispredict | 5 ns | |
| L2 cache reference | 7 ns | |
| Mutex lock/unlock | 25 ns | |
| Main memory reference | 100 ns | |
| Compress 1KB with Zippy | 3,000 ns (3 μs) | |
| Send 1KB over 1 Gbps network | 10,000 ns (10 μs) | |
| Read 4KB randomly from SSD | 150,000 ns (150 μs) | |
| Read 1MB sequentially from memory | 250,000 ns (250 μs) | |
| Round trip within same datacenter | 500,000 ns (0.5 ms) | |
| Read 1MB sequentially from SSD | 1,000,000 ns (1 ms) | |
| HDD disk seek | 10,000,000 ns (10 ms) | |
| Read 1MB sequentially from HDD | 20,000,000 ns (20 ms) | |
| Send packet CA → Netherlands → CA | 150,000,000 ns (150 ms) |
Key takeaways:
- Memory is ~100x faster than SSD.
- SSD is ~10-100x faster than HDD.
- Network within datacenter: ~0.5ms.
- Cross-continent round trip: ~150ms.
- Avoid disk seeks; use sequential reads and caching.
Availability Numbers
| Availability | Downtime/Year | Downtime/Month | Downtime/Week |
|---|---|---|---|
| 99% (two 9s) | 3.65 days | 7.31 hours | 1.68 hours |
| 99.9% (three 9s) | 8.76 hours | 43.8 min | 10.1 min |
| 99.99% (four 9s) | 52.6 min | 4.38 min | 1.01 min |
| 99.999% (five 9s) | 5.26 min | 26.3 sec | 6.05 sec |
| 99.9999% (six 9s) | 31.5 sec | 2.63 sec | 0.605 sec |
Common Data Size Estimates
| Data Type | Size |
|---|---|
| A character (ASCII) | 1 byte |
| A character (UTF-8, common) | 1-4 bytes |
| An integer (32-bit) | 4 bytes |
| A long integer (64-bit) | 8 bytes |
| A UUID | 16 bytes |
| A short URL (like tinyurl hash) | 7-8 bytes |
| An email address | ~50 bytes |
| A tweet (280 chars, UTF-8) | ~560 bytes |
| A metadata record (JSON) | 1-10 KB |
| A web page (HTML) | 50-100 KB |
| A thumbnail image | 10-50 KB |
| A full-size photo | 200 KB - 5 MB |
| A minute of MP3 audio | ~1 MB |
| A minute of 720p video | ~50 MB |
| A minute of 1080p video | ~150 MB |
| A minute of 4K video | ~350 MB |
Throughput Estimates
| Component | Throughput |
|---|---|
| A single server (simple web app) | 1,000-10,000 RPS |
| A single server (CPU-intensive) | 100-500 RPS |
| Redis (single instance, reads) | 100,000+ RPS |
| Memcached (single instance) | 100,000+ RPS |
| MySQL (reads, indexed) | 10,000-50,000 QPS |
| MySQL (writes) | 1,000-10,000 QPS |
| PostgreSQL (reads, indexed) | 10,000-50,000 QPS |
| Cassandra (single node) | 10,000-50,000 RPS |
| Kafka (single broker) | 100,000+ messages/sec |
| Nginx (reverse proxy) | 50,000-100,000 RPS |
| Single HDD | 100-200 IOPS |
| Single SSD | 10,000-100,000 IOPS |
| Network (1 Gbps) | ~125 MB/sec |
| Network (10 Gbps) | ~1.25 GB/sec |
Estimation Framework
Step 1: Define Requirements
- DAU (Daily Active Users): How many users per day?
- Peak-to-average ratio: Typically 2-5x average.
- Read/write ratio: Most systems are 10:1 to 100:1 read-heavy.
Step 2: Estimate Traffic
Step 3: Estimate Storage
Step 4: Estimate Bandwidth
Step 5: Estimate Servers Needed
Example Estimations
Example 1: URL Shortener (like bit.ly)
Requirements:
- 100M new URLs/day
- 10:1 read-to-write ratio
- Store for 5 years
- Each URL: ~500 bytes (original URL + short URL + metadata)
Traffic:
Write QPS = 100M / 100,000 sec = 1,000 QPS
Read QPS = 1,000 × 10 = 10,000 QPS
Peak QPS = 10,000 × 3 = 30,000 QPS
Storage:
Daily = 100M × 500 bytes = 50 GB/day
Yearly = 50 GB × 365 = 18.25 TB/year
5 years = 18.25 TB × 5 ≈ 91 TB
Cache (80/20 rule — 20% of URLs get 80% of traffic):
Daily reads = 100M × 10 = 1B reads
Unique URLs read per day ≈ 200M (some read multiple times)
Cache 20% = 40M × 500 bytes = 20 GB cache (fits in memory!)
Example 2: Twitter-like Timeline
Requirements:
- 300M DAU
- Each user views timeline 5 times/day
- Each timeline fetch returns 20 tweets
- Average tweet: 140 chars + metadata ≈ 1 KB
- 500K new tweets/day per celebrity (fanout)
Traffic:
Timeline reads = 300M × 5 = 1.5B reads/day
Timeline QPS = 1.5B / 100K sec = 15,000 QPS
Peak QPS = 15,000 × 3 = 45,000 QPS
Bandwidth:
Per timeline = 20 tweets × 1 KB = 20 KB
Read bandwidth = 15,000 QPS × 20 KB = 300 MB/sec
Example 3: Chat Application (like WhatsApp)
Requirements:
- 500M DAU
- Each user sends 40 messages/day
- Average message: 100 bytes
- Messages stored for 30 days
Traffic:
Total messages/day = 500M × 40 = 20B messages/day
Write QPS = 20B / 100K sec = 200,000 QPS
Peak QPS = 200,000 × 3 = 600,000 QPS
Storage:
Daily = 20B × 100 bytes = 2 TB/day
30 days = 2 TB × 30 = 60 TB
Bandwidth:
Write bandwidth = 200,000 QPS × 100 bytes = 20 MB/sec
Quick Mental Math Tips
| If You Have... | Then... |
|---|---|
| 1M users, 10 actions/day | ~100 QPS |
| 100M users, 1 action/day | ~1,000 QPS |
| 1B users, 1 action/day | ~10,000 QPS |
| 10 QPS per server | 1,000 QPS needs 100 servers |
| 1 KB per request | 10,000 QPS = 10 MB/sec |
| 1 MB per request | 1,000 QPS = 1 GB/sec |
Rounding shortcuts:
- 1 day ≈ 100,000 seconds (actually 86,400)
- 1 year ≈ 30 million seconds (actually 31,536,000)
(1 million) seconds ≈ 11.5 days (1 billion) seconds ≈ 31.7 years
System Resource Estimates
| Resource | Single Server |
|---|---|
| CPU cores | 8-64 cores |
| RAM | 32-256 GB |
| SSD | 1-10 TB |
| Network | 1-10 Gbps |
| Concurrent connections | 10,000-65,000 |
How Many Servers?
Example: 50,000 peak QPS, each server handles 5,000 QPS:
servers = (50,000 / 5,000) × 2 = 20 servers
Database Size Estimation
Row Count to Storage
Table: users
Columns: id (8B) + name (50B) + email (50B) + bio (200B) + timestamps (16B) + indexes (100B)
≈ 424 bytes per row, round to ~500 bytes
100M users × 500 bytes = 50 GB
Index Size
Rule of thumb: Indexes add 20-30% to table size.
50 GB table + 30% indexes ≈ 65 GB total
Bandwidth Estimation
| Scenario | Calculation | Result |
|---|---|---|
| API (1 KB responses, 10K QPS) | 10,000 × 1 KB | 10 MB/sec |
| Image serving (500 KB, 1K QPS) | 1,000 × 500 KB | 500 MB/sec |
| Video streaming (5 Mbps, 10K users) | 10,000 × 5 Mbps | 50 Gbps |
Estimation Checklist
When doing back-of-the-envelope calculations:
- State your assumptions clearly (DAU, actions/user, data sizes).
- Round aggressively — use powers of 10 for easy math.
- Calculate QPS (average and peak).
- Calculate storage (daily, yearly, multi-year).
- Calculate bandwidth (inbound + outbound).
- Calculate number of servers/instances needed.
- Consider cache size (80/20 rule).
- Sanity check: Does the result make sense?
Summary
| Concept | Key Point |
|---|---|
| 1 day | ≈ 100,000 seconds |
| 80/20 rule | 20% of data causes 80% of traffic |
| Read:Write | Most systems are 10:1 to 100:1 |
| Memory vs Disk | Memory is ~100x faster than SSD, ~1000x faster than HDD |
| Single server | 1K-10K simple HTTP requests/sec |
| Redis | 100K+ ops/sec per instance |
| Storage growth | Estimate for 3-5 years and plan for 2x buffer |
Rule of thumb: Always round up, add safety margins (2-3x), and never let perfect be the enemy of good. The goal is to be within the right order of magnitude, not exact.