Problem
Arrays are the foundation of all data structures. Understanding how they are laid out in memory explains why some operations are fast and others are not â and why your database's buffer pool, Redis strings, and CPU caches all fundamentally rely on contiguous memory.
Why It Matters (Latency, Throughput, Cost)
CPU cache line effects:
Array access (sequential): ~4 cycles per element (L1 cache hit)
Array access (random): ~40 cycles per element (L3 cache miss)
Linked list traversal: ~200 cycles per node (RAM access per node)
Sequential array: 1,000,000 elements summed â ~4ms
Linked list: 1,000,000 elements summed â ~200ms (50Ã slower)This is why PostgreSQL stores heap pages as fixed-size byte arrays, not linked structures.
Mental Model
An array is a contiguous block of memory. Element i is at address: base_address + i à element_size.
Address: 0x1000 0x1008 0x1010 0x1018 0x1020
Value: [ 42 ][ 17 ][ 99 ][ 3 ][ 58 ]
Index: 0 1 2 3 4CPU L1 cache fetches 64-byte cache lines. A 64-byte cache line holds 8 int64 values. Accessing element 0 loads elements 0â7 into cache "for free". Sequential access = 1 cache miss per 8 elements. Random access = potentially 1 cache miss per element.
Underlying Theory
Virtual memory: Arrays occupy contiguous virtual addresses, but physical frames may be scattered. The MMU + TLB translates virtual to physical. Accessing a new page (4KB) on first touch triggers a page fault (OS intervention, ~1Ξs). Large arrays touch many pages.
Row-major vs column-major storage:
C/Python/Go: Row-major (row elements contiguous)
matrix[i][j] â base + i * col_count * sizeof(T) + j * sizeof(T)
Fortran/MATLAB: Column-major (column elements contiguous)
matrix[i][j] â base + j * row_count * sizeof(T) + i * sizeof(T)NumPy operations iterate in the memory-contiguous direction for speed.
Naive Approach
# Cache-unfriendly: accessing column-major pattern in row-major array
matrix = [[random.random() for _ in range(1000)] for _ in range(1000)]
# Column-by-column access (cache miss per element in row-major layout)
total = sum(matrix[i][j] for j in range(1000) for i in range(1000))Optimized Approach
# Cache-friendly: row-by-row access
total = sum(matrix[i][j] for i in range(1000) for j in range(1000))
# Even better: use NumPy's vectorized operations (SIMD + cache-optimized)
import numpy as np
matrix = np.random.random((1000, 1000))
total = matrix.sum() # 100Ã faster than Python loopComplexity Analysis
| Operation | Time | Space | Notes |
|---|---|---|---|
| Random access by index | O(1) | O(1) | |
| Sequential scan | O(N) | O(1) | |
| Insert at end (amortized) | O(1) | O(1) | |
| Insert at front | O(N) | O(1) | |
| Search (unsorted) | O(N) | O(1) | |
| Search (sorted, binary) | O(log N) | O(1) |
Benchmark
Operation on N=1,000,000 int64 array:
Sequential sum: 2ms (L1/L2 cache hits)
Random access sum: 100ms (50Ã slower â L3/RAM misses)
Python list sum: 50ms (object overhead, indirect pointers)
NumPy array sum: 1ms (SIMD + contiguous memory)Key Takeaways
- Contiguous memory = cache-friendly = fast sequential access.
- Random access into large arrays â cache misses â 50Ã slower.
- Python lists store pointers to objects, not objects themselves â extra indirection.
- NumPy/C arrays store values directly â fully contiguous, SIMD-friendly.
- Database pages are fixed-size contiguous byte arrays for this reason.