Problem
How many database connections do you need? How long will requests wait when your API server is at 80% load? What happens to latency as utilization approaches 100%? These questions have exact mathematical answers.
Why It Matters
Without queueing theory: "I'll set pool_size=100, that seems like a lot"
With queueing theory: "My DB has 8 cores, avg query is 10ms,
target 500 req/s:
pool_size = ceil(500 Ć 0.010) Ć 1.3 = 7"Little's Law
L = Ī»W ā the most useful equation in systems engineering.
- L = average number of items in the system (queue + service)
- Ī» (lambda) = arrival rate (items/second)
- W = average time each item spends in the system (seconds)
This holds for any stable queuing system regardless of arrival or service distributions.
Applications
Connection pool sizing:
Given: 200 req/s hitting the DB, avg query = 15ms
L = Ī» Ć W = 200 Ć 0.015 = 3 connections needed on average
Add headroom for bursts: pool_size = L Ć 2 = 6 connectionsRequest queue depth:
Given: 1000 req/s, avg latency = 50ms
L = 1000 Ć 0.050 = 50 requests in flight at any time
If your server handles only 40 concurrent ā queue backs up ā latency explodesCache expiry and refresh:
Given: 10,000 cache lookups/minute, avg item lives 60 seconds
L = (10,000/60) Ć 60 = 10,000 items in cache at any time
Cache sizing: 10,000 Ć avg_item_sizeThe Utilization Saturation Curve
The M/M/1 queue model (Poisson arrivals, exponential service time, 1 server):
Average wait time W_q = (Ļ / μ) Ć (1 / (1 - Ļ))
Where:
Ļ (rho) = Ī»/μ = utilization (0 to 1)
μ = service rate (requests/second)
Ī» = arrival rate (requests/second)The hockey stick:
Utilization (Ļ) | Wait multiplier (W_q / service_time)
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
10% | 0.11Ć (nearly no wait)
50% | 1.00Ć (wait = 1 service time)
70% | 2.33Ć
80% | 4.00Ć
90% | 9.00Ć
95% | 19.0Ć
99% | 99.0ĆThis is why "always keep utilization < 70%" is practical wisdom ā above 70%, latency grows faster than load.
M/M/c: Multi-Server Queue (Connection Pool)
For c parallel servers (connection pool of size c):
The Erlang C formula gives P(wait) ā probability a request must wait:
For practical purposes, use the approximation:
avg_wait ā (P_wait Ć service_time) / (c Ć (1 - Ļ/c))
Where Ļ/c = per-server utilizationIn practice: use the formula pool_size = ceil(Ī» Ć avg_service_time Ć safety_factor) where safety_factor = 1.2ā1.5.
Benchmark: Queueing vs Non-Queueing
Pool size=5, query time=10ms, arrival rate=100 req/s:
Utilization Ļ = 100 Ć 0.010 / 5 = 0.20 (20% per server)
Expected wait ā 0.05ms ā p99 ā 10.5ms
Pool size=5, arrival rate=400 req/s (Ļ = 0.80):
Expected wait ā 4 Ć 10ms = 40ms ā p99 ā 65ms
Pool size=5, arrival rate=450 req/s (Ļ = 0.90):
Expected wait ā 9 Ć 10ms = 90ms ā p99 ā 130ms
(system approaching instability)Key Takeaways
- Little's Law: L = λW. Memorize this. Use it to size every resource.
- The hockey stick: at >80% utilization, latency grows exponentially. Design for 70% max.
- Pool sizing:
pool_size = ceil(peak_rps à avg_query_seconds à 1.3). - Stability requires: arrival rate < service rate. If λ > μ, the queue grows without bound.
- M/M/c > M/M/1: more servers reduces wait time super-linearly. Doubling pool size more than halves wait time when heavily loaded.