Cache Strategies in Distributed Systems

The Real Problem: When Many Cache Keys Expire at the Same Time

Imagine you are caching the Cricket Score data for 5 minutes, 1 million users are watching the score at same time, TTL = 300 seconds for all the Keys.
At 300th second, all the cache entries expires simlutaneously at the same time.
All the 1 milliion user requests, hit the Cache MISS, all these requests suddenly hit the Database, which instantly increases the DB Load.

This is how the system crashes during the Traffic Spike.

This problem becomes visible during:

IPL streaming surges
Netflix new season release
E-commerce flash sales
Ticket booking launches

The root issue?
Basic TTL caching is not enough in distributed systems.

Why Basic TTL Caching is not enough?

If cache HIT -> Fast Reponse, cache MISS -> Fetches from DB and updates the cache.
This works fine in normal Traffic.

But in Distributed Systems:

Multiple servers
Shared cache
Massive traffic
Identical TTL values

When TTL expires simultaneously → synchronized burst → database overload.

This leads to:

CPU spikes
Connection pool exhaustion
Increased latency
Cascading failures

Understanding the Core Issue

At time T:

All Cache keys will expire simultaneously.
All requests will hit Cache MISS.
DB loads increases suddenly.

This is a kind of Thundering Herd Problem.

Now Let's Solve this Issue:

Each Solution exists to solve the same problem.

Prevent synchronized recomputation and reduce traffic spikes.

TTL Jitter (Randomized Expiration)

Problem: All Cache Keys expire at the same time.

Solution: Add a Randomness to the Cache TTL Expiry so Synchronized expiration doesn't takes place.

Instead of : TTL = 300 seconds for all Cache Keys.
Use: TTL = 300 ± random(0–30) seconds.

💡

Now keys expire gradually, not together.

Why it works?
It spreads recomputation across time, reducing synchronized load.

Probability-Based Early Expiration (Probabilistic Early Re-computation)

When we have a large numbers of Cache Keys, so instead of waiting for the exact TTL Expiry, Some requests refresh the cache early, based on probability.
Fundamentally, the idea is to distribute the expiration times to reduce the likely hood of a cache stampede.

With every requests we calculate a probability for the expiration of Cache entries, Probability function is calculated in such a way that as the Cache Expiration approaches it's probability value increases.
As cache gets closer to expiry:

Chance of refresh increases
Only some requests recompute
Not everyone waits until the last second

Think of it like:

If TTL is near zero, maybe 10% of requests refresh early.

This prevents the “all expire at once” effect.

The diagram shows:

Normal cache hit flow
When expiration approaches → probabilistic check
If passed → refresh early
If failed → serve stale
Eventually full expiration happens safely

So Finally:

Probabilistic early expiration spreads recomputation over time, preventing synchronized bursts that cause cache stampede.

Mutex / Cache Locking

The Problem
When cache expires: 1000 requests try to rebuild cache at once.
The Solution
Allow only one request to rebuild. All others wait.

How It Works

Request checks cache → miss
Tries to acquire lock
If lock acquired:
- Fetch from DB
- Update cache
- Release lock
Other requests:
- Wait
- Or serve stale data

This prevents duplicate recomputation.

Stale-While-Revalidate (SWR)

This strategy improves availability.

Instead of blocking users in case of Cache Expiry:

Serve stale data
Recompute in background

This is how CDNs behave.

Real Life Example:
When a new episode just drops on Netflix: User can might briefly see some old metadata, till the time Backend Refreshes the data.

Users get:

Fast response
High availability
Minimal overload

Tradeoff
You sacrifice perfect freshness for stability.

Cache Warming / Pre-Warming

This is a proactive mechanism, instead of waiting of traffic spike to occur, we populate the Cache entries before users arrive.

Before:

IPL final stream starts
E-commerce flash sale begins
Product launch at 12 PM

At 11:55 AM:

Preload popular keys into cache.

Why it works?
Users never sees old Cache and Database load remains stable.

Tradeoffs: Freshness, Latency and Availability

Strategy	Freshness	Latency	Stability
Basic TTL	Medium	Low	Low
TTL Jitter	Medium	Low	Medium
Probabilistic	High	Low	High
Mutex	High	Medium	High
SWR	Medium	Very Low	Very High
Cache Warming	High	Very Low	High

There is no perfect solution.

Every system balances

How fresh data must be
How much latency is acceptable
How much inconsistency you can tolerate.

When to Use Which Strategy

Use TTL Jitter
- When many keys share identical TTL
- Simple microservices
Use Probabilistic Expiration
- Large-scale distributed systems
- High traffic systems
Use Mutex Locking
- Expensive DB queries
- Critical resources
Use SWR
- Read-heavy workloads
- User-facing APIs
- CDN-like systems
Use Cache Warming
- Before launches
- Before flash sales
- Before predictable spikes
Real-World Behavior During Traffic Spikes

Netflix Release
- Pre-warm popular titles
- Use SWR for metadata
- Add TTL jitter to avoid synchronized expiry
IPL Streaming
- Preload match data
- Use probabilistic refresh
- Scale infrastructure
E-Commerce Sale
- Preload product pages
- Lock inventory cache updates
- Apply rate limiting
The Core System Design Insight

Caching in distributed systems is not about speed.

It is about:
- Smoothing load
- Preventing synchronized recomputation
- Protecting databases
- Improving availability
The real enemy is:

Synchronized behavior across distributed nodes.
Final Takeaway

Basic TTL works in small systems.

But at scale:
- Use jitter to avoid synchronized expiry
- Use mutex to prevent duplicate recomputation
- Use probabilistic expiration to smooth refresh
- Use SWR to prioritize availability
- Use cache warming before predictable spikes
Distributed systems fail not because of traffic.

They fail because of synchronized bursts caused by poor cache design.

Command Palette

The Real Problem: When Many Cache Keys Expire at the Same Time

Why Basic TTL Caching is not enough?

Understanding the Core Issue

TTL Jitter (Randomized Expiration)

Probability-Based Early Expiration (Probabilistic Early Re-computation)

Mutex / Cache Locking

Stale-While-Revalidate (SWR)

Cache Warming / Pre-Warming

Tradeoffs: Freshness, Latency and Availability

When to Use Which Strategy

Real-World Behavior During Traffic Spikes

The Core System Design Insight

Final Takeaway

Comments