How Tinder Migrated From Self-Hosted Redis To AWS ElastiCache
Tinder sees over 2 billion daily member actions and hosts more than 30 billion matches.
Needless to say, this activity demands a robust backend system that is capable of maintaining ultra-low latency and high availability.
For years Tinder relied on Redis for caching to meet these needs. But as their app’s popularity grew, maintaining the self-hosted Redis clusters proved overly challenging.
Eventually, Tinder migrated to Amazon ElastiCache with hopes of significantly improving scalability, stability, and overall operational efficiency.
This is the story of how Tinder went from using self-hosted Redis to Amazon Elasticache and how the journey and results dramatically transformed their caching systems.
Self-Hosted Redis Challenges
At first, Tinder used EC2 instances to self-host Redis clusters.
The configuration relied on a cache-aside pattern, where the Tinder services queried Redix for data before falling back to a source of truth database like DynamoDB (and sometimes PostgreSQL or MongoDB).
Simply put, cache hits would be served from the self-hosted Redis caches and a cache miss would be served from DynamoDB and then would populate the cache.
To scale this system, Tinder used sharded Redis clusters on EC2 using static partitioning.
This worked in Tinder’s early days but soon became unsustainable as their traffic grew, for a few reasons:
How ElastiCache Changed The Game
From these issues, the team at Tinder searched for alternative caching solutions that can support high scales.
They considered DAX (DynamoDB Accelerator) initially but ultimately chose ElastiCache for the following reasons:
The Migration Process
Migrating the self-hosted Redis clusters to ElastiCache was a multi-step process. Tinder designed this migration carefully so they would have minimal downtime for their app’s users.
Here’s how they did it.
Simplified Configuration
Tinder started by updating their application clients to connect to ElastiCache clusters using a primary cluster endpoint instead of the static topology maps that they used in their old setup.
This significantly reduced configuration complexity and improved caching maintainability.
Fork-Writes for Cache Warming
They then implemented a fork-writing strategy (as seen in the diagram above).
With this form-writing strategy, data writes were duplicated to both the old and new Redis clusters.
This allowed ElastiCache clusters to “warm up” with data while avoiding downtime.
Validating new clusters
They verified the integrity of the new ElastiCache cluster by comparing metrics from both the new and old clusters.
Once the data consistency reached an acceptable threshold, they gradually began routing user traffic to the new cluster.
Scaling and optimizing
After cutting over to ElastiCache, Tinder could dynamically add shards and easily rebalance traffic without downtime.
Additionally, the fact that ElastiCache handles maintenance tasks like patching freed up a lot of the Tinder engineers’ time.
Results Of A Fully Managed Cache
The lead engineers at Tinder stated that after the cutover to ElastiCache, they saw an immediate and significant improvement to their caching infrastructure.
Conclusion
Migrating to Amazon ElastiCache offered tremendous and lasting improvements to Tinder’s caching infrastructure.
It allowed Tinder to meet the demands of its quickly rising user base while also reducing operational overhead for its engineers and enhancing the stability of its app.
By freeing up engineers from infrastructure maintenance, ElastiCache allowed Tinder to improve the experience for its +5 million and growing subscribers worldwide.
👋 My name is Uriel Bitton and I hope you learned something in this edition of The Serverless Spotlight
🔗 You can share the article with your network to help others learn as well.
📬 If you want to learn how to save money in the cloud you can subscribe to my brand new newsletter The Cloud Economist.
🙌 I hope to see you in next week's edition!