Cache Strategies in Distributed Systems
The Real Problem: When Many Cache Keys Expire at the Same Time
Imagine you are caching the Cricket Score data for 5 minutes, 1 million users are watching the score at same time, TTL = 300 seconds for all the Keys.
At 300th second, all the cache entries expires simlutaneously at the same time.
All the 1 milliion user requests, hit the Cache MISS, all these requests suddenly hit the Database, which instantly increases the DB Load.
This is how the system crashes during the Traffic Spike.
This problem becomes visible during:
IPL streaming surges
Netflix new season release
E-commerce flash sales
Ticket booking launches
The root issue?
Basic TTL caching is not enough in distributed systems.
Why Basic TTL Caching is not enough?
If cache HIT -> Fast Reponse, cache MISS -> Fetches from DB and updates the cache.
This works fine in normal Traffic.
But in Distributed Systems:
Multiple servers
Shared cache
Massive traffic
Identical TTL values
When TTL expires simultaneously → synchronized burst → database overload.
This leads to:
CPU spikes
Connection pool exhaustion
Increased latency
Cascading failures
Understanding the Core Issue
At time T:
All Cache keys will expire simultaneously.
All requests will hit Cache MISS.
DB loads increases suddenly.
This is a kind of Thundering Herd Problem.
Now Let's Solve this Issue:
Each Solution exists to solve the same problem.
- Prevent synchronized recomputation and reduce traffic spikes.
TTL Jitter (Randomized Expiration)
Problem: All Cache Keys expire at the same time.
Solution: Add a Randomness to the Cache TTL Expiry so Synchronized expiration doesn't takes place.
Instead of : TTL = 300 seconds for all Cache Keys.
Use: TTL = 300 ± random(0–30) seconds.
Why it works?
It spreads recomputation across time, reducing synchronized load.
Probability-Based Early Expiration (Probabilistic Early Re-computation)
When we have a large numbers of Cache Keys, so instead of waiting for the exact TTL Expiry, Some requests refresh the cache early, based on probability.
Fundamentally, the idea is to distribute the expiration times to reduce the likely hood of a cache stampede.
With every requests we calculate a probability for the expiration of Cache entries, Probability function is calculated in such a way that as the Cache Expiration approaches it's probability value increases.
As cache gets closer to expiry:
Chance of refresh increases
Only some requests recompute
Not everyone waits until the last second
Think of it like:
If TTL is near zero, maybe 10% of requests refresh early.
This prevents the “all expire at once” effect.
The diagram shows:
Normal cache hit flow
When expiration approaches → probabilistic check
If passed → refresh early
If failed → serve stale
Eventually full expiration happens safely
So Finally:
Probabilistic early expiration spreads recomputation over time, preventing synchronized bursts that cause cache stampede.
Mutex / Cache Locking
The Problem
When cache expires: 1000 requests try to rebuild cache at once.
The Solution
Allow only one request to rebuild. All others wait.
How It Works
Request checks cache → miss
Tries to acquire lock
If lock acquired:
Fetch from DB
Update cache
Release lock
Other requests:
Wait
Or serve stale data
This prevents duplicate recomputation.
Stale-While-Revalidate (SWR)
This strategy improves availability.
Instead of blocking users in case of Cache Expiry:
Serve stale data
Recompute in background
This is how CDNs behave.
Real Life Example:
When a new episode just drops on Netflix: User can might briefly see some old metadata, till the time Backend Refreshes the data.
Users get:
Fast response
High availability
Minimal overload
Tradeoff
You sacrifice perfect freshness for stability.
Cache Warming / Pre-Warming
This is a proactive mechanism, instead of waiting of traffic spike to occur, we populate the Cache entries before users arrive.
Before:
IPL final stream starts
E-commerce flash sale begins
Product launch at 12 PM
At 11:55 AM:
Preload popular keys into cache.
Why it works?
Users never sees old Cache and Database load remains stable.
Tradeoffs: Freshness, Latency and Availability
| Strategy | Freshness | Latency | Stability |
|---|---|---|---|
| Basic TTL | Medium | Low | Low |
| TTL Jitter | Medium | Low | Medium |
| Probabilistic | High | Low | High |
| Mutex | High | Medium | High |
| SWR | Medium | Very Low | Very High |
| Cache Warming | High | Very Low | High |
There is no perfect solution.
Every system balances
How fresh data must be
How much latency is acceptable
How much inconsistency you can tolerate.
When to Use Which Strategy
Use TTL Jitter
When many keys share identical TTL
Simple microservices
Use Probabilistic Expiration
Large-scale distributed systems
High traffic systems
Use Mutex Locking
Expensive DB queries
Critical resources
Use SWR
Read-heavy workloads
User-facing APIs
CDN-like systems
Use Cache Warming
Before launches
Before flash sales
Before predictable spikes
Real-World Behavior During Traffic Spikes
Netflix Release
Pre-warm popular titles
Use SWR for metadata
Add TTL jitter to avoid synchronized expiry
IPL Streaming
Preload match data
Use probabilistic refresh
Scale infrastructure
E-Commerce Sale
Preload product pages
Lock inventory cache updates
Apply rate limiting
The Core System Design Insight
Caching in distributed systems is not about speed.
It is about:
Smoothing load
Preventing synchronized recomputation
Protecting databases
Improving availability
The real enemy is:
Synchronized behavior across distributed nodes.
Final Takeaway
Basic TTL works in small systems.
But at scale:
Use jitter to avoid synchronized expiry
Use mutex to prevent duplicate recomputation
Use probabilistic expiration to smooth refresh
Use SWR to prioritize availability
Use cache warming before predictable spikes
Distributed systems fail not because of traffic.
They fail because of synchronized bursts caused by poor cache design.