SDC#29 - Don't Cache Without Asking Questions

🖥 Top Questions to consider before Caching

You can play around with the diagram on Eraser.io

I love caching just as much as the next developer.

It’s a wonderful technique that has helped many startups scale to billions of users without pushing their investors into poverty.

It also works spectacularly well to improve the performance of crappy applications without improving those applications.

When you think about it, caching appears like a silver bullet.

Yes it is - but only when used by someone who understands that Earth is not flat.

In the hands of any other person, caching can turn into a disaster.

So - how to make sure that you use caching in the right place at the right time?

I always ask these 7 questions.

1 - Am I using cache to hide the real issue?

I know it’s tempting but you don’t use the silver bullet at the first sign of trouble.

Many times, I see developers create inefficient data models and write queries that don’t use proper indexes.

When performance issues pop up, they just throw Redis in front of the database to hide the mess without bothering to even explore the possibilities with database optimization.

It appears reasonable in the moment.

However, they are trading simplicity for complexity. Also, they are paying more for that complexity.

I’m not saying it doesn’t work. But it’s not a great move.

Caching should be used as the last resort.

2 - Is the data frequently accessed?

What’s the use of caching?

Fetch high-demand data without making an expensive call to the source of truth.

Caching data that is accessed once in a while is hardly worth the trouble.

Therefore, I always try to check the statistical distribution of data access for my application.

Caching is effective when the data in your application has a bell-curve distribution when it comes to access frequency.

See below example:

I agree that the bell-curve can be steeper. But you get the general idea. The point is to cache data that has a high ROI.

However, there’s one place where we can bend this rule - for data that requires heavy computation. For example, a result coming out of running an intense ML workflow. Even if the data is accessed infrequently, it’s valuable to cache it.

3 - Is the data dynamic?

Imagine implementing a demand-filled look-aside cache where you fetch an item from the database and store it in the cache.

You think the next time the same item is needed, you will be able to pick it up from the cache. But when the time comes, the item is gone from the cache because it was invalidated due to an update.

There is no advantage to caching an item that’s too dynamic.

Of course, dynamism is a relative property and must be judged within the context of the system. What might be too dynamic for one system may be quite static for another system. For example, 100 likes per minute on a niche blog website versus 100 likes per minute on social media.

I always try to cache data that has a reasonable validity period in the context of the application.

4 - Which caching strategy should I choose?

There are multiple caching strategies to choose from.

Some of the most popular ones are:

Cache-Aside or Look-Aside
Read-Through
Write-Through
Write-Back
Write-Around

Here’s a diagram that shows how they work.

If interested, you can read about them in detail in an earlier post.

SDC#17 - Database Caching Strategies

Saurabh Dashora

December 5, 2023

Read full story

But what’s the point?

Each strategy has some advantages and disadvantages.

Choosing a strategy before you start is important as it will impact other parameters such as consistency, latency and performance.

5 - How will I maintain cache consistency?

By now, we all agree that Mr Phil Karlton was right.

“Cache invalidation is one of the hardest problems in computer science”

But why do you need to invalidate an item in the cache?

To maintain cache consistency.

No one wants the cache to return stale data. But it’s a fact that the data will change over time and you want to keep the cache up-to-date.

It’s a thorny problem that has different levels of impact.

For example, you can code your application to invalidate a cache item whenever it updates the source database. But in a distributed environment with replica regions, you don’t want invalidations to happen before the replication. Otherwise, there are chances of stale data.

Also, in a highly concurrent environment with multiple clients trying to establish cache consistency, which client should get the honor?

If you don’t choose an appropriate strategy, you may have a cache stampede/thundering herd on your hands.

I don’t mean to frighten you.

I just mean to express the number of possibilities you may have to consider with regards to cache consistency.

Asking this question up-front can save you a lot of headache later on.

6 - Cold Cache or Warm Cache?

I always consider whether cold cache is fine or the system needs a warm cache.

A cold cache takes time to reach a good performance level. During that time, many requests will hit the database or the disk.

However, a warm cache is one that has been populated with data. This means faster access time from the start.

In a highly concurrent environment, a cold cache can cause cascading failures due to excessive load on the backend and the database. So, it might be better to go for a warm cache.

But for that, you need to create a strategy to pre-cache data based on the expected demand. It might be a costly effort.

There are trade-offs on both sides of the river. Answering the question will help you choose the right trade-off for your application.

7 - How to measure cache effectiveness?

Caching is useless if it’s not effective.

You’ve to measure the effectiveness of the cache. For example, tracking metrics such as hit rates or latency.

Now, I understand this is something that needs to be done after you start caching.

But it’s a good idea to decide on the metrics that are important before you start. And what’re you going to do about those metrics?

For example, if the hit rates aren’t good enough, you may need to tweak parameters such as TTL.

If the latency is unacceptable, you may need to add more nodes to the cluster.

👉 Over To You

Now, I don’t claim that this is an exhaustive list of questions. But they have always helped me make better decisions about my caching solution.

However, I’m sure there are more questions to consider and you’ve may have come across them.

If you have any in mind, do mention it in the comments.

Roundup

The Importance of Having Opinions by
Gregor Ojstersek
: A very nice article on why you should have opinions and how to develop them.
Introduction to MLOps by
Amrut Patil
: A detailed look at essentials of MLOps and its key components.
How to pick technologies for your next project by
Akos
: Choosing technologies for the next project is always a exciting but also problematic. How do you make the right choice?
How Canva Support Real Time Collaboration by
Neo Kim
: A quick look at the technologies behind Canva’s real-time collaboration feature.

That’s it for today! ☀️

Enjoyed this issue of the newsletter?

Share with your friends and colleagues.

See you later with another value-packed edition — Saurabh.

Akos Komuves

Feb 21, 2024Edited

That's a great write-up, Saurabh! Caching to speed things up is like optimizing web page speed without measuring first what elements take the most time to load or which code blocks the initial render.

You can even spend money on "silver bullet" solutions that only help you obstruct the real issue, just like applying caching first, without thinking.

Thanks for mentioning my writing, glad you liked it! 🤝

Expand full comment

1 reply by Saurabh Dashora