Reddit is like the front page of the Internet. It hosts billions of posts. And a lot many of these posts contain media content such as images, videos, gifs, and so on. While the media content is often stored in object storage, the metadata is different. For example, if you’ve got a video, you might need to store information such as the thumbnail URL, playback URLs, bitrates, and various resolutions.
It's quite incredible to see what kind of problems occur at such a scale. I have no idea how I would have done it. On the other hand, I'm relieved that the cat videos I upload go through Kafka, pgBouncer, and Aurora. 😃
And of course, thanks for the shoutout! 🙇♂️ Glad you liked this article.
Since latest post (lets say last 1K) would reside on a single partition because of range based parition, wont it create a hot partition for reading data as reddit would return latest posts.
It's quite incredible to see what kind of problems occur at such a scale. I have no idea how I would have done it. On the other hand, I'm relieved that the cat videos I upload go through Kafka, pgBouncer, and Aurora. 😃
And of course, thanks for the shoutout! 🙇♂️ Glad you liked this article.
Haha...the cat videos are the most important ones.
Thanks Akos!
Interesting read. Thanks for Sharing
Thank you
This was a great example of replacing a production service.
Dual writes and backfill was the right move!
Thanks for explaining this and mentioning my article, Saurabh.
Thanks Raul!
Reddit's solution is pretty neat I agree.
Interesting article, Saurabh! Migrating data is always a big challenge.
Thanks Fernando.
Migrations can go either way and it's important to have safeguards in place.
Great article! Thanks for writing!
Thank you Devarshi!
100K is a hell of a performance, solid post my friend!
Thank you Daniel!
It was indeed a great case study to learn about.
Since latest post (lets say last 1K) would reside on a single partition because of range based parition, wont it create a hot partition for reading data as reddit would return latest posts.