Problem
Hash tables provide O(1) average-case get, put, and delete. They underpin Redis, Python dicts, Go maps, JavaScript objects, and database hash indexes. Understanding their internals explains why they can degrade to O(N) and how to prevent it.
Why It Matters
Linear search in list: O(N) â 10ms for N=100,000
Hash table lookup: O(1) â 0.001ms regardless of NPython dict, Go map, Redis hash, HTTP header lookup â all O(1). Your application's performance often depends on how many O(N) list lookups you are accidentally doing instead.
Mental Model
key â hash_function(key) â bucket_index â linked list of entries
ââââââââââââââââââââââââââââââââââââââââââââââââââââ
â Hash Table (capacity = 8 buckets) â
â â
â [0]: "apple" â 42 ââââââââââââââââââââââ â
â [1]: (empty) â
â [2]: "banana" â 7 ââš "cherry" â 15 (collision) â
â [3]: (empty) â
â [4]: "date" â 99 ââââââââââââââââââââââ â
â ... â
ââââââââââââââââââââââââââââââââââââââââââââââââââââ
get("banana"):
1. bucket = hash("banana") % 8 = 2 O(1)
2. Walk list at bucket[2] for "banana" O(1) if short listHash Functions
A good hash function distributes keys uniformly across buckets. Python uses SipHash-1-3 (secure, resistant to hash flooding attacks). Go uses AES-based hashing on architectures that support AES-NI.
# Python's hash() function (simplified concept, not actual implementation)
hash("hello") # â some large integer, e.g., -3550055125485641917
bucket_index = hash("hello") % table_capacityCollision Resolution
Chaining: Each bucket holds a linked list. Worst case: all N keys hash to one bucket â O(N).
Open addressing: On collision, probe to next bucket. Better cache locality than chaining.
Python 3.6+ dict: Compact array layout + hash table. Maintains insertion order. Load factor target: 2/3.
Load Factor and Rehashing
load_factor = entries_count / bucket_count
When load_factor > threshold (Python: 2/3, Go: ~6.5):
â Allocate new array with 2Ã capacity
â Re-hash all entries into new array O(N) operation
â Amortized O(1) per insertion
Rehashing example:
dict has 4 buckets, 3 entries (load 0.75 > 0.67)
â allocate 8 buckets
â re-insert all 3 entries (rehash each key)
â future inserts fit without rehashing for a whileComplexity Analysis
| Operation | Average | Worst Case | Worst Case Cause |
|---|---|---|---|
| get(key) | O(1) | O(N) | |
| put(key) | O(1) amortized | O(N) | |
| delete(key) | O(1) | O(N) | |
| Iteration | O(N) | O(N) |
Benchmark
N=1,000,000 string keys:
dict[key] lookup: ~0.05Ξs/lookup
list.index(key): ~5ms (100,000Ã slower)
set membership test: ~0.04Ξs (hash set, no value stored)
Hash collision attack (Python 2, before SipHash):
Adversarial keys all hash to same bucket â O(N) per lookup
Mitigated in Python 3 with randomized hash seed (PYTHONHASHSEED)Failure Modes
Memory overhead: Each hash table entry has overhead (hash value, key, value, next pointer). A hash table with 1M entries of 8-byte ints uses significantly more memory than an 8MB array of ints.
Hash flooding attack: Without a randomized seed, an attacker can craft keys that all collide â O(N) server-side processing per request (DoS). Python 3, Go, and Node.js all use randomized seeds.
Map iteration order: Go maps iterate in random order by design (to prevent relying on implementation details). Python dicts maintain insertion order since Python 3.7.
Key Takeaways
- O(1) average, O(N) worst case. Good hash function + bounded load factor ensures O(1) in practice.
- Load factor drives rehashing. At 2/3 capacity, Python rehashes â O(N) cost, O(1) amortized.
- Always use hash sets/maps for membership testing. Never
if x in list. - Hash flooding is a real security concern â use PYTHONHASHSEED or equivalent.
- Redis hashes, Python dicts, Go maps, HTTP header maps â all the same data structure.