Skip to main content

Command Palette

Search for a command to run...

Cache Strategies in Distributed Systems

Updated
6 min read
  1. The Real Problem: When Many Cache Keys Expire at the Same Time

Imagine you are caching the Cricket Score data for 5 minutes, 1 million users are watching the score at same time, TTL = 300 seconds for all the Keys.
At 300th second, all the cache entries expires simlutaneously at the same time.
All the 1 milliion user requests, hit the Cache MISS, all these requests suddenly hit the Database, which instantly increases the DB Load.

This is how the system crashes during the Traffic Spike.

This problem becomes visible during:

  • IPL streaming surges

  • Netflix new season release

  • E-commerce flash sales

  • Ticket booking launches

The root issue?
Basic TTL caching is not enough in distributed systems.

  1. Why Basic TTL Caching is not enough?

If cache HIT -> Fast Reponse, cache MISS -> Fetches from DB and updates the cache.
This works fine in normal Traffic.

But in Distributed Systems:

  • Multiple servers

  • Shared cache

  • Massive traffic

  • Identical TTL values

When TTL expires simultaneously → synchronized burst → database overload.

This leads to:

  • CPU spikes

  • Connection pool exhaustion

  • Increased latency

  • Cascading failures

  1. Understanding the Core Issue

At time T:

  • All Cache keys will expire simultaneously.

  • All requests will hit Cache MISS.

  • DB loads increases suddenly.

This is a kind of Thundering Herd Problem.

Now Let's Solve this Issue:

Each Solution exists to solve the same problem.

  • Prevent synchronized recomputation and reduce traffic spikes.
  1. TTL Jitter (Randomized Expiration)

Problem: All Cache Keys expire at the same time.

Solution: Add a Randomness to the Cache TTL Expiry so Synchronized expiration doesn't takes place.

Instead of : TTL = 300 seconds for all Cache Keys.
Use: TTL = 300 ± random(0–30) seconds.

💡
Now keys expire gradually, not together.

Why it works?
It spreads recomputation across time, reducing synchronized load.

  1. Probability-Based Early Expiration (Probabilistic Early Re-computation)

When we have a large numbers of Cache Keys, so instead of waiting for the exact TTL Expiry, Some requests refresh the cache early, based on probability.
Fundamentally, the idea is to distribute the expiration times to reduce the likely hood of a cache stampede.

With every requests we calculate a probability for the expiration of Cache entries, Probability function is calculated in such a way that as the Cache Expiration approaches it's probability value increases.
As cache gets closer to expiry:

  • Chance of refresh increases

  • Only some requests recompute

  • Not everyone waits until the last second

Think of it like:

If TTL is near zero, maybe 10% of requests refresh early.

This prevents the “all expire at once” effect.

The diagram shows:

  1. Normal cache hit flow

  2. When expiration approaches → probabilistic check

  3. If passed → refresh early

  4. If failed → serve stale

  5. Eventually full expiration happens safely

So Finally:

Probabilistic early expiration spreads recomputation over time, preventing synchronized bursts that cause cache stampede.

  1. Mutex / Cache Locking

The Problem
When cache expires: 1000 requests try to rebuild cache at once.
The Solution
Allow only one request to rebuild. All others wait.

How It Works

  1. Request checks cache → miss

  2. Tries to acquire lock

  3. If lock acquired:

    • Fetch from DB

    • Update cache

    • Release lock

  4. Other requests:

    • Wait

    • Or serve stale data

This prevents duplicate recomputation.

  1. Stale-While-Revalidate (SWR)

This strategy improves availability.

Instead of blocking users in case of Cache Expiry:

  • Serve stale data

  • Recompute in background

This is how CDNs behave.

Real Life Example:
When a new episode just drops on Netflix: User can might briefly see some old metadata, till the time Backend Refreshes the data.

Users get:

  • Fast response

  • High availability

  • Minimal overload

Tradeoff
You sacrifice perfect freshness for stability.

  1. Cache Warming / Pre-Warming

This is a proactive mechanism, instead of waiting of traffic spike to occur, we populate the Cache entries before users arrive.

Before:

  • IPL final stream starts

  • E-commerce flash sale begins

  • Product launch at 12 PM

At 11:55 AM:

Preload popular keys into cache.

Why it works?
Users never sees old Cache and Database load remains stable.

  1. Tradeoffs: Freshness, Latency and Availability

Strategy Freshness Latency Stability
Basic TTL Medium Low Low
TTL Jitter Medium Low Medium
Probabilistic High Low High
Mutex High Medium High
SWR Medium Very Low Very High
Cache Warming High Very Low High

There is no perfect solution.

Every system balances

  • How fresh data must be

  • How much latency is acceptable

  • How much inconsistency you can tolerate.

  1. When to Use Which Strategy

    Use TTL Jitter

    • When many keys share identical TTL

    • Simple microservices

    Use Probabilistic Expiration

    • Large-scale distributed systems

    • High traffic systems

    Use Mutex Locking

    • Expensive DB queries

    • Critical resources

    Use SWR

    • Read-heavy workloads

    • User-facing APIs

    • CDN-like systems

    Use Cache Warming

    • Before launches

    • Before flash sales

    • Before predictable spikes

  2. Real-World Behavior During Traffic Spikes

    Netflix Release

    • Pre-warm popular titles

    • Use SWR for metadata

    • Add TTL jitter to avoid synchronized expiry

    IPL Streaming

    • Preload match data

    • Use probabilistic refresh

    • Scale infrastructure

    E-Commerce Sale

    • Preload product pages

    • Lock inventory cache updates

    • Apply rate limiting

  3. The Core System Design Insight

    Caching in distributed systems is not about speed.

    It is about:

    • Smoothing load

    • Preventing synchronized recomputation

    • Protecting databases

    • Improving availability

    The real enemy is:

    Synchronized behavior across distributed nodes.

  4. Final Takeaway

    Basic TTL works in small systems.

    But at scale:

    • Use jitter to avoid synchronized expiry

    • Use mutex to prevent duplicate recomputation

    • Use probabilistic expiration to smooth refresh

    • Use SWR to prioritize availability

    • Use cache warming before predictable spikes

    Distributed systems fail not because of traffic.

    They fail because of synchronized bursts caused by poor cache design.