Real-world failure analysis and performance investigations. Each case study follows the structure:
- Incident: What happened, impact, duration
- Investigation: How it was diagnosed (tools, traces, metrics)
- Root Cause: The underlying technical cause
- Fix: What was changed and why
- Prevention: What monitoring/code change prevents recurrence
- Lessons: Generalizable takeaways
Planned Case Studies
| Case Study | Domain | Root Cause |
|---|---|---|
01-n-plus-one-at-scale.md | Databases | |
02-pool-exhaustion-cascade.md | Connection Pooling | |
03-cache-cold-start.md | Caching | |
04-goroutine-leak.md | Concurrency | |
05-index-regression.md | Databases |
Contributing Case Studies
Case studies must be anonymized (no client/company identifiers) and technically accurate. The root cause must be verifiable and the fix demonstrably effective.