LinkedIn runs hundreds of microservices, which communicate at an average rate of tens of millions of calls per second.
Wherever there’s communication, there are also chances of security issues creeping in like those pesky neighbors peeking into your house. Proper authorization controls are critical to minimize data breaches if a service is compromised.
Access Control Lists (ACLs) are the most common approach to enforce such authorization controls.
With ACLs, you can define which users, groups, or processes have access to specific objects such as files, directories, applications, or network resources. It’s like a table or list specifying a particular object’s permissions.
Here’s an example of ACL for a particular service:
In this example ACL:
The “client-service” is allowed to perform GET requests on the “greeting” resource, but denied from making PUT requests.
The “admin-service” can perform GET and PUT requests on the “greeting” resource.
For every request, the ACL is checked and access is granted or denied based on the defined permission levels.
The Challenge at LinkedIn’s scale
While the process sounds simple, scale changes everything.
There are 4 main challenges for LinkedIn:
They need to check authorization quickly
They need to deliver ACL changes promptly across the service stack
They need to manage a large number of ACLs
They need to monitor the ACL checks
The diagram below shows how they handled each of these challenges.
Let’s look at how LinkedIn solves each of these issues.
Fast Authorization Checks
To handle fast authorization checks, an authorization client module runs on every service at LinkedIn.
It keeps relevant ACL data in memory to avoid network calls during checks.
Deliver ACL Changes Quickly
ACL data is periodically refreshed in the background.
The refresh rate is such that it balances the need for timely changes and the load on the system.
Manage ACL Data
ACLs are stored in LinkedIn’s Espresso database, with a look-aside Couchbase cache for improved latency and scalability.
But how is the cache kept consistent with the database?
A change data capture system based on Brooklin notifies the services when an ACL changes to clear the cache.
Lastly, a REST API is exposed through a management interface and a command-line tool. Developers can use these interfaces to manage the ACL data.
Monitoring ACL Data
Every authorization check is logged asynchronously using LinkedIn’s Kafka message queue.
This is used for debugging, traffic analysis, auditing, and investigations. Engineers can access insights through the inGraphs monitoring system.
References:
https://www.linkedin.com/blog/engineering/scalability/authorization-at-linkedins-scale
Shoutout
Here are some interesting articles I read this week:
The hard part I skipped: Passkeys Cryptography by
Caching in Distributed Systems - Part III by
Functional Error Handling in Node.js With The Result Pattern by
That’s it for today! ☀️
Enjoyed this issue of the newsletter?
Share with your friends and colleagues.




The in-memory ACL cache and async logging make sense at this scale.
Curious how ACL invalidation is handled during updates, any cache poisoning risks?
Good read, Saurabh.