Backend Audit Checklist

Use this checklist to assess the health of a backend system. Each item has a measurable threshold. Items marked āœ— are immediate action items. Items marked ⚠ are improvement opportunities.

Scoring: 1 point per āœ“. 0 points per āœ— or ⚠.

  • 90–100%: Production-ready
  • 70–89%: Some gaps, addressable within one sprint
  • 50–69%: Significant technical debt, prioritize this quarter
  • <50%: High risk, stop and fix before further feature development

---

1. API Performance

#CheckThresholdMethod
1.1Read endpoints p99 latency< 200ms
1.2Write endpoints p99 latency< 500ms
1.3No endpoint exceeds p99 > 1s under normal load0 violations
1.4p99/p50 ratio (tail latency amplification)< 4Ɨ
1.5Timeouts configured on all outbound HTTP calls100% coverage
1.6Request size limits enforcedMax body size configured

How to measure:

# k6 load test — generates p50/p99 breakdown
k6 run --vus 50 --duration 60s script.js

# wrk quick benchmark
wrk -t 4 -c 100 -d 30s --latency http://localhost:8080/api/users

---

2. Database

#CheckThresholdMethod
2.1No query exceeds p99 > 100ms under normal load0 violations
2.2Foreign key columns have indexes100% of FKs
2.3No sequential scans on tables > 10,000 rows in hot paths0 violations
2.4N+1 query patterns absent0 hot endpoints with queries/req > 10
2.5Query timeouts configuredstatement_timeout set
2.6Slow query log enabledlog_min_duration_statement ≤ 100ms
2.7EXPLAIN ANALYZE reviewed for all queries used > 1000Ɨ/dayDocumented
2.8Index bloat < 20%pgstattuple

Quick FK index audit (PostgreSQL):

SELECT
    tc.table_name,
    kcu.column_name,
    ccu.table_name AS foreign_table,
    (SELECT COUNT(*) FROM pg_indexes
     WHERE tablename = tc.table_name
     AND indexdef LIKE '%' || kcu.column_name || '%') AS index_count
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
    ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.referential_constraints rc
    ON tc.constraint_name = rc.constraint_name
JOIN information_schema.constraint_column_usage ccu
    ON ccu.constraint_name = rc.unique_constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY'
HAVING index_count = 0;
-- Results: FK columns with no index → add these indexes

---

3. Caching

#CheckThresholdMethod
3.1Cache hit rate for hot read paths> 80%
3.2TTL configured on all cache keys100% have explicit TTL
3.3Eviction policy configuredallkeys-lru or volatile-lru set
3.4maxmemory configuredNot unlimited
3.5Cache key namespace collision checkNo two entities share prefix
3.6Thundering herd protection on popular keysPER or mutex in place
3.7Negative caching for non-existent lookupsBloom filter or null-TTL caching
3.8Cache eviction rate acceptable< 5% eviction/hour

Redis health check:

redis-cli INFO stats | grep -E "keyspace_hits|keyspace_misses|evicted_keys"
redis-cli INFO memory | grep -E "used_memory_human|maxmemory_human"
# hit_rate = keyspace_hits / (keyspace_hits + keyspace_misses)
# Target: > 0.80

---

4. Connection Management

#CheckThresholdMethod
4.1Connection pool configured (not new-connection-per-request)Pool exists
4.2Pool size follows sizing formulapool_size ā‰ˆ DB_CPU_cores Ɨ 2
4.3Pool acquire timeout configured≤ 30 seconds
4.4max_conn_lifetime configured< 30 minutes
4.5max_conn_idle_time configured< 10 minutes
4.6Connection leak detection enabledLeak detection implemented
4.7Pool metrics exposed (active, idle, waiting)Metrics endpoint exists
4.8Connections released in finally blocks100% coverage
4.9No connections held across external HTTP calls0 violations
4.10For serverless: connection proxy (RDS Proxy/PgBouncer) in useProxy configured

Pool sizing calculator:

target_pool_size = ceil(peak_rps Ɨ avg_query_duration_seconds Ɨ 1.3)

Example:
  Peak load: 500 req/s
  Avg query: 15ms = 0.015s
  Safety factor: 1.3
  pool_size = ceil(500 Ɨ 0.015 Ɨ 1.3) = ceil(9.75) = 10
  Cross-check: DB has 4 cores → 4 Ɨ 2 = 8 → take max(10, 8) = 10

---

5. Observability

#CheckThresholdMethod
5.1Distributed tracing implemented100% of service boundaries
5.2RED metrics exported (Rate, Errors, Duration)Per endpoint
5.3Structured logging (JSON, not freeform text)100% of log lines
5.4Request IDs propagated across service calls100% of requests
5.5Error rates alertedAlert at > 1% error rate
5.6Latency p99 alertedAlert at > 500ms p99
5.7DB query count per request trackedMetric exists
5.8Cache hit rate trackedMetric exists

Minimum viable metric set (Prometheus):

# Must have these metrics for every HTTP endpoint:
http_requests_total{method, path, status_code}
http_request_duration_seconds{method, path}  # histogram with p50/p99

# Database:
db_queries_total{endpoint}
db_query_duration_seconds  # histogram

# Cache:
cache_hits_total
cache_misses_total
cache_evictions_total

# Connection pool:
db_pool_connections_total{state}  # state: idle, active
db_pool_wait_duration_seconds     # histogram

---

6. Security

#CheckThresholdMethod
6.1Authentication on all non-public endpoints100% coverage
6.2Authorization checked at data layer (not just route layer)100% coverage
6.3Rate limiting on all public endpointsLimits configured
6.4Rate limiting on auth endpointsStricter limits (e.g., 10 req/min)
6.5SQL injection prevention (parameterized queries)0 string-concatenated queries
6.6Input validation on all user-provided dataValidation library in use
6.7Secrets not in source code0 secrets in git history
6.8CORS configured (not * on API)Domain allowlist
6.9Security headers set (HSTS, X-Frame-Options, etc.)Headers present
6.10Dependencies scanned for CVEsScan in CI pipeline

---

7. Reliability

#CheckThresholdMethod
7.1Circuit breakers on all external service calls100% of external calls
7.2Retry with exponential backoff implementedRetries capped, jittered
7.3Timeouts on ALL outbound calls (DB, HTTP, cache)100% have explicit timeout
7.4Graceful shutdown implementedSIGTERM handled, in-flight requests complete
7.5Health check endpoints exist (liveness + readiness)/health/live and /health/ready
7.6Deployment is zero-downtimeRolling update or blue/green
7.7Database migrations are backward-compatibleNon-breaking schema changes
7.8Feature flags availableFlag system in use

Circuit breaker threshold guidance:

Open circuit when:
  error_rate > 50% in last 10 seconds AND at least 20 requests
Half-open: allow 1 request through every 30 seconds to test recovery
Close: if 3 consecutive successes in half-open state

---

8. Scalability

#CheckThresholdMethod
8.1Horizontal scaling testedService runs with N>1 instances without issues
8.2Stateless service (no in-process session state)State in Redis/DB only
8.3No shared mutable in-memory state across requestsConfirmed stateless
8.4Database read replicas used for read-heavy queriesRead/write split configured
8.5Long-running jobs in async queue (not synchronous HTTP)Job queue in use
8.6File uploads/downloads proxied (not through app server)Presigned URLs or CDN
8.7Pagination on all list endpointsCursor or offset pagination
8.8Max response size boundedNo unbounded result sets

---

9. Development Practices

#CheckThresholdMethod
9.1EXPLAIN ANALYZE for every new queryPR review requirement
9.2Load tests for every new endpoint > 100 req/dayLoad test in CI
9.3Query count asserted in testsTest fails if N+1 introduced
9.4Database migrations reviewed for lock riskAdvisory lock audit
9.5Performance budget defined and enforcedp99 SLA per endpoint documented

---

Score Summary Template

Team: _______________   Date: _______________   Reviewer: _______________

Section                    Score    Max    %
─────────────────────────────────────────────
1. API Performance         ___      6      ___
2. Database                ___      8      ___
3. Caching                 ___      8      ___
4. Connection Management   ___      10     ___
5. Observability           ___      8      ___
6. Security                ___      10     ___
7. Reliability             ___      8      ___
8. Scalability             ___      8      ___
9. Development Practices   ___      5      ___
─────────────────────────────────────────────
TOTAL                      ___      71     ___  %

Top 3 action items:
1. _______________________________________________
2. _______________________________________________
3. _______________________________________________

šŸ“š Related Topics