In 2022 we shipped four separate Node.js services for four different clients. All four had Redis in the architecture by the end of discovery. It was almost automatic. 'Sessions? Redis. Rate limiting? Redis. Cache in front of the database? Of course, Redis.' Less than a year later we ran an internal retro and found that three of those four Redises never needed to be in production at all.
This isn't a Redis-bashing post. We still use it. It's a note about how adding a piece of infrastructure becomes a habit before anyone notices — and what that habit costs when it doesn't get audited.
How we got there
None of those projects had a cache in the brief. All four had one on the architecture diagram we drew in week one. Nobody questioned it — it was there because 'this is how we do it.' Lost productivity that looks like a habit is often worse than a bad decision. At least a bad decision was once said out loud.
The audit we ran on ourselves
At the end of 2022 we sat down and looked at what those Redises were actually doing across the four production estates. Three months of Datadog metrics plus quick profiling against a local Postgres replica.
- Project A (B2B SaaS, ~150 RPS): cache hit rate around 8%. Ninety-two percent of queries went to Postgres anyway. The cache added a network hop and an invalidation problem.
- Project B (internal tool, ~20 RPS): Redis used only for sessions. A Postgres table with an index on session_id did the same job in practically the same time — the table was already in RAM.
- Project C (public marketplace, ~600 RPS): Redis used for rate limiting and product cache. Rate limiting was justified. The product cache wasn't — a Postgres prepared statement against a materialised view was faster and didn't need invalidation.
- Project D (admin dashboard, ~10 RPS): Redis was provisioned but barely used. It cost memory and one extra instance for no benefit.
Three of the four Redises came out after the retro. We told each client by email — five lines: why we were doing it, what it meant for their infra bill, and when. None of them had a follow-up question.
When Redis is still the right call
- Genuinely high-frequency read-heavy workloads where latency matters and Postgres is already at IO limits.
- Rate limiting across multiple instances where you need atomic increments with TTL.
- Pub/sub for small real-time events where Postgres LISTEN/NOTIFY isn't enough (typically over ~1,000 events/sec).
- Short-lived queues where you don't need durability and a Postgres-side queue is overkill.
Redis is a good tool. The problem was never with it. The problem was that we kept adding it before checking whether we needed it.
Three questions we now ask before adding a cache
- What's the expected cache hit rate? If we can't tell, or we estimate under 30%, it's not worth it.
- What's the invalidation strategy, before we write a single line? If we can't sketch it on a whiteboard in three minutes, the bugs that come out of it will cost more than the slow query.
- Can Postgres just do it? Materialised views, partial indexes, JSONB with GIN, LISTEN/NOTIFY — they cover nine out of ten 'we need a cache' situations.
A lot of 'standard' architecture is actually sediment. We add things because we added them before. Adding Redis takes an hour. Pulling it out a year into production takes three days and one tense meeting.