Queueing Theory
Problem
How many database connections do you need? How long will requests wait when your API server is at 80% load? What happens to latency as utilization approaches 100%? These questions have exact mathematical answers.
Why It Matters
Without queueing theory: "I'll set pool_size=100, that seems like a lot"
With queueing theory: "My DB has 8 cores, avg query is 10ms,
target 500 req/s:
pool_size = ceil(500 × 0.010) × 1.3 = 7"Little's Law
L = λW — the most useful equation in systems engineering.
- L = average number of items in the system (queue + service)
- λ (lambda) = arrival rate (items/second)
- W = average time each item spends in the system (seconds)
This holds for any stable queuing system regardless of arrival or service distributions.
Applications
Connection pool sizing:
Given: 200 req/s hitting the DB, avg query = 15ms
L = λ × W = 200 × 0.015 = 3 connections needed on average
Add headroom for bursts: pool_size = L × 2 = 6 connectionsRequest queue depth:
Given: 1000 req/s, avg latency = 50ms
L = 1000 × 0.050 = 50 requests in flight at any time
If your server handles only 40 concurrent → queue backs up → latency explodesCache expiry and refresh:
Given: 10,000 cache lookups/minute, avg item lives 60 seconds
L = (10,000/60) × 60 = 10,000 items in cache at any time
Cache sizing: 10,000 × avg_item_sizeThe Utilization Saturation Curve
The M/M/1 queue model (Poisson arrivals, exponential service time, 1 server):
Average wait time W_q = (ρ / μ) × (1 / (1 - ρ))
Where:
ρ (rho) = λ/μ = utilization (0 to 1)
μ = service rate (requests/second)
λ = arrival rate (requests/second)The hockey stick:
Utilization (ρ) | Wait multiplier (W_q / service_time)
─────────────────────────────────────────────────────────
10% | 0.11× (nearly no wait)
50% | 1.00× (wait = 1 service time)
70% | 2.33×
80% | 4.00×
90% | 9.00×
95% | 19.0×
99% | 99.0×This is why "always keep utilization < 70%" is practical wisdom — above 70%, latency grows faster than load.
M/M/c: Multi-Server Queue (Connection Pool)
For c parallel servers (connection pool of size c):
The Erlang C formula gives P(wait) — probability a request must wait:
For practical purposes, use the approximation:
avg_wait ≈ (P_wait × service_time) / (c × (1 - ρ/c))
Where ρ/c = per-server utilizationIn practice: use the formula pool_size = ceil(λ × avg_service_time × safety_factor) where safety_factor = 1.2–1.5.
Benchmark: Queueing vs Non-Queueing
Pool size=5, query time=10ms, arrival rate=100 req/s:
Utilization ρ = 100 × 0.010 / 5 = 0.20 (20% per server)
Expected wait ≈ 0.05ms → p99 ≈ 10.5ms
Pool size=5, arrival rate=400 req/s (ρ = 0.80):
Expected wait ≈ 4 × 10ms = 40ms → p99 ≈ 65ms
Pool size=5, arrival rate=450 req/s (ρ = 0.90):
Expected wait ≈ 9 × 10ms = 90ms → p99 ≈ 130ms
(system approaching instability)Key Takeaways
- Little's Law: L = λW. Memorize this. Use it to size every resource.
- The hockey stick: at >80% utilization, latency grows exponentially. Design for 70% max.
- Pool sizing:
pool_size = ceil(peak_rps × avg_query_seconds × 1.3). - Stability requires: arrival rate < service rate. If λ > μ, the queue grows without bound.
- M/M/c > M/M/1: more servers reduces wait time super-linearly. Doubling pool size more than halves wait time when heavily loaded.
in this section