How LinkedIn Uses Caching to Serve 5M Profile…

Saurabh Dashora

Mar 19, 2024

Couchbase Cache, Espresso and Brooklin

Read →

10 Comments

Fran Soto

Mar 21, 2024

Thanks for the shoutout, Saurabh.

Interesting the CAS to ensure you can retry as a client starting from the very beginning

Expand full comment

Reply (1)

Saurabh Dashora

Mar 22, 2024

It's definitely a nice approach. Glad you found it useful 👍

Expand full comment

Ashwani Yadav

Apr 14, 2024

amazing post. thanks Saurabh!

For minimizing data divergence during concurrent updates, I really liked the idea of CAS (compare and swap). I had read about it in context of operating systems. It was nice to know how its used in distributed systems.

Expand full comment

Reply (1)

Saurabh Dashora

Apr 16, 2024

Thanks for the great feedback Ashwani!

Expand full comment

Sounak Gupta

Apr 2, 2024

Hi Saurabh, great read. One Q !

You wrote "Concurrent modifications can occur when routers and cache updaters try to update a cache entry." , but Espresso router never directly updates the couchbase as per the architecture right ? Its either via Cache updater or Cache bootstrapping. Please correct me if i have missing something

Expand full comment

Reply (1)

Saurabh Dashora

Apr 7, 2024

Thanks Sounak!

Actually, the router also tries to update the Couchbase cache when there's a cache miss. Here is the exact point from the Read Path section:

"In case of a cache miss, the request is served by the storage node. The router returns the profile information to the backend.

Lastly, the router upserts the data asynchronously into the cache."

Expand full comment

Anubhav Gupta

Mar 28, 2024Edited

Thanks for this Saurabh, I have just basic question ,

1. Since we are using counchbase, which won't be in memory if i am correct . Would be using the disk. Wouldn't that increase the latency ?

2. Here cache miss is less , because using couchbase with large memory is that ?

Expand full comment

Reply (1)

Saurabh Dashora

Mar 29, 2024

Hi Anubhav

1 - Yes, latency will be higher as compared to the in-memory cache (like the OHC). But it will still be much less when compared to the DB.

2 - Cache miss is less because Couchbase is a distributed cache whereas OHC was confined to a particular router instance. Plus, they also make sure to keep the cache updated using CDC and the bootstrap process.

Expand full comment

Akos Komuves

Mar 19, 2024

Great stuff, Saurabh!

I love how they made sure that Couchbase is healthy because the alternative is just to fall back to DB reads/writes.

As for Minimizing Data Divergence, is that something you have to implement in these systems, or are they shipped as simple config options as part of these solutions?

This post is also a good reminder to introduce such tech when you actually hit a limit with your current solution, which is – for most SaaS – never.

Expand full comment

Reply (1)

Saurabh Dashora

Mar 19, 2024

Thanks Akos!

And yes, it was important to avoid fallback as it would make the cache useless when it's needed the most.

With regards to minimizing the divergence, a lot of it has to be done by the team. For example, they implemented periodic bootstrapping of the cache using Brooklin. Also, using a Couchbase versioning feature to implement compare-and-swap. It all depends on their service level objective concerning divergence.

Of course, as you said, very few companies reach the scale of LinkedIn to implement such solutions. In fact, LinkedIn also implemented these solutions when they really needed.

Expand full comment

System Design Codex

How LinkedIn Uses Caching to Serve 5M Profile…