20. Back-of-the-Envelope Estimation

Back-of-the-envelope calculations are rough estimates used to evaluate system design feasibility. They help you quickly determine whether a proposed architecture can handle the expected load, how much storage is needed, and what resources are required.


Why Estimation Matters

  • Validate feasibility: "Can a single server handle this load?"
  • Capacity planning: "How many servers / how much storage do we need?"
  • Design decisions: "Do we need sharding, caching, CDN?"
  • Interview signal: Demonstrates ability to think quantitatively about systems.

Powers of 2 Reference

Power Exact Value Approximate
1,024 ~1 thousand (1 KB)
1,048,576 ~1 million (1 MB)
1,073,741,824 ~1 billion (1 GB)
1,099,511,627,776 ~1 trillion (1 TB)
~1 PB (petabyte)

Latency Numbers Every Engineer Should Know

Operation Latency Notes
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 25 ns
Main memory reference 100 ns
Compress 1KB with Zippy 3,000 ns (3 μs)
Send 1KB over 1 Gbps network 10,000 ns (10 μs)
Read 4KB randomly from SSD 150,000 ns (150 μs)
Read 1MB sequentially from memory 250,000 ns (250 μs)
Round trip within same datacenter 500,000 ns (0.5 ms)
Read 1MB sequentially from SSD 1,000,000 ns (1 ms)
HDD disk seek 10,000,000 ns (10 ms)
Read 1MB sequentially from HDD 20,000,000 ns (20 ms)
Send packet CA → Netherlands → CA 150,000,000 ns (150 ms)

Key takeaways:

  • Memory is ~100x faster than SSD.
  • SSD is ~10-100x faster than HDD.
  • Network within datacenter: ~0.5ms.
  • Cross-continent round trip: ~150ms.
  • Avoid disk seeks; use sequential reads and caching.

Availability Numbers

Availability Downtime/Year Downtime/Month Downtime/Week
99% (two 9s) 3.65 days 7.31 hours 1.68 hours
99.9% (three 9s) 8.76 hours 43.8 min 10.1 min
99.99% (four 9s) 52.6 min 4.38 min 1.01 min
99.999% (five 9s) 5.26 min 26.3 sec 6.05 sec
99.9999% (six 9s) 31.5 sec 2.63 sec 0.605 sec

Common Data Size Estimates

Data Type Size
A character (ASCII) 1 byte
A character (UTF-8, common) 1-4 bytes
An integer (32-bit) 4 bytes
A long integer (64-bit) 8 bytes
A UUID 16 bytes
A short URL (like tinyurl hash) 7-8 bytes
An email address ~50 bytes
A tweet (280 chars, UTF-8) ~560 bytes
A metadata record (JSON) 1-10 KB
A web page (HTML) 50-100 KB
A thumbnail image 10-50 KB
A full-size photo 200 KB - 5 MB
A minute of MP3 audio ~1 MB
A minute of 720p video ~50 MB
A minute of 1080p video ~150 MB
A minute of 4K video ~350 MB

Throughput Estimates

Component Throughput
A single server (simple web app) 1,000-10,000 RPS
A single server (CPU-intensive) 100-500 RPS
Redis (single instance, reads) 100,000+ RPS
Memcached (single instance) 100,000+ RPS
MySQL (reads, indexed) 10,000-50,000 QPS
MySQL (writes) 1,000-10,000 QPS
PostgreSQL (reads, indexed) 10,000-50,000 QPS
Cassandra (single node) 10,000-50,000 RPS
Kafka (single broker) 100,000+ messages/sec
Nginx (reverse proxy) 50,000-100,000 RPS
Single HDD 100-200 IOPS
Single SSD 10,000-100,000 IOPS
Network (1 Gbps) ~125 MB/sec
Network (10 Gbps) ~1.25 GB/sec

Estimation Framework

Step 1: Define Requirements

  • DAU (Daily Active Users): How many users per day?
  • Peak-to-average ratio: Typically 2-5x average.
  • Read/write ratio: Most systems are 10:1 to 100:1 read-heavy.

Step 2: Estimate Traffic

Step 3: Estimate Storage

Step 4: Estimate Bandwidth

Step 5: Estimate Servers Needed


Example Estimations

Example 1: URL Shortener (like bit.ly)

Requirements:

  • 100M new URLs/day
  • 10:1 read-to-write ratio
  • Store for 5 years
  • Each URL: ~500 bytes (original URL + short URL + metadata)

Traffic:

Write QPS = 100M / 100,000 sec = 1,000 QPS
Read QPS  = 1,000 × 10 = 10,000 QPS
Peak QPS  = 10,000 × 3 = 30,000 QPS

Storage:

Daily    = 100M × 500 bytes = 50 GB/day
Yearly   = 50 GB × 365 = 18.25 TB/year
5 years  = 18.25 TB × 5 ≈ 91 TB

Cache (80/20 rule — 20% of URLs get 80% of traffic):

Daily reads = 100M × 10 = 1B reads
Unique URLs read per day ≈ 200M (some read multiple times)
Cache 20% = 40M × 500 bytes = 20 GB cache (fits in memory!)

Example 2: Twitter-like Timeline

Requirements:

  • 300M DAU
  • Each user views timeline 5 times/day
  • Each timeline fetch returns 20 tweets
  • Average tweet: 140 chars + metadata ≈ 1 KB
  • 500K new tweets/day per celebrity (fanout)

Traffic:

Timeline reads = 300M × 5 = 1.5B reads/day
Timeline QPS   = 1.5B / 100K sec = 15,000 QPS
Peak QPS       = 15,000 × 3 = 45,000 QPS

Bandwidth:

Per timeline = 20 tweets × 1 KB = 20 KB
Read bandwidth = 15,000 QPS × 20 KB = 300 MB/sec

Example 3: Chat Application (like WhatsApp)

Requirements:

  • 500M DAU
  • Each user sends 40 messages/day
  • Average message: 100 bytes
  • Messages stored for 30 days

Traffic:

Total messages/day = 500M × 40 = 20B messages/day
Write QPS = 20B / 100K sec = 200,000 QPS
Peak QPS  = 200,000 × 3 = 600,000 QPS

Storage:

Daily    = 20B × 100 bytes = 2 TB/day
30 days  = 2 TB × 30 = 60 TB

Bandwidth:

Write bandwidth = 200,000 QPS × 100 bytes = 20 MB/sec

Quick Mental Math Tips

If You Have... Then...
1M users, 10 actions/day ~100 QPS
100M users, 1 action/day ~1,000 QPS
1B users, 1 action/day ~10,000 QPS
10 QPS per server 1,000 QPS needs 100 servers
1 KB per request 10,000 QPS = 10 MB/sec
1 MB per request 1,000 QPS = 1 GB/sec

Rounding shortcuts:

  • 1 day ≈ 100,000 seconds (actually 86,400)
  • 1 year ≈ 30 million seconds (actually 31,536,000)
  • (1 million) seconds ≈ 11.5 days
  • (1 billion) seconds ≈ 31.7 years

System Resource Estimates

Resource Single Server
CPU cores 8-64 cores
RAM 32-256 GB
SSD 1-10 TB
Network 1-10 Gbps
Concurrent connections 10,000-65,000

How Many Servers?

Example: 50,000 peak QPS, each server handles 5,000 QPS:

servers = (50,000 / 5,000) × 2 = 20 servers

Database Size Estimation

Row Count to Storage

Table: users
Columns: id (8B) + name (50B) + email (50B) + bio (200B) + timestamps (16B) + indexes (100B)
≈ 424 bytes per row, round to ~500 bytes

100M users × 500 bytes = 50 GB

Index Size

Rule of thumb: Indexes add 20-30% to table size.

50 GB table + 30% indexes ≈ 65 GB total

Bandwidth Estimation

Scenario Calculation Result
API (1 KB responses, 10K QPS) 10,000 × 1 KB 10 MB/sec
Image serving (500 KB, 1K QPS) 1,000 × 500 KB 500 MB/sec
Video streaming (5 Mbps, 10K users) 10,000 × 5 Mbps 50 Gbps

Estimation Checklist

When doing back-of-the-envelope calculations:

  1. State your assumptions clearly (DAU, actions/user, data sizes).
  2. Round aggressively — use powers of 10 for easy math.
  3. Calculate QPS (average and peak).
  4. Calculate storage (daily, yearly, multi-year).
  5. Calculate bandwidth (inbound + outbound).
  6. Calculate number of servers/instances needed.
  7. Consider cache size (80/20 rule).
  8. Sanity check: Does the result make sense?

Summary

Concept Key Point
1 day ≈ 100,000 seconds
80/20 rule 20% of data causes 80% of traffic
Read:Write Most systems are 10:1 to 100:1
Memory vs Disk Memory is ~100x faster than SSD, ~1000x faster than HDD
Single server 1K-10K simple HTTP requests/sec
Redis 100K+ ops/sec per instance
Storage growth Estimate for 3-5 years and plan for 2x buffer

Rule of thumb: Always round up, add safety margins (2-3x), and never let perfect be the enemy of good. The goal is to be within the right order of magnitude, not exact.