flipkart story banner

Flipkart is India’s largest e-commerce company with Walmart, the global retail giant, owning a majority controlling stake in the business. It has more than 400 million registered sellers and buyers, gets over 10 million daily page visits, sends over 8 million monthly shipments, and owns 22 state-of-the-art warehouses in India. During “The Big Billion Days 2022,” a large shopping festival equivalent to Black Friday in the US, Flipkart served billions of customer visits and secured billions of dollars of gross merchandise value (GMV). 

A large MySQL fleet behind Flipkart 

With over 400 applications, thousands of microservices, and countless varieties of end-to-end e-commerce operations running across over 700 MySQL clusters, Flipkart runs one of India’s largest MySQL fleets. 

Massive numbers of applications and services generate petabytes of data and varieties of data types and formats. Flipkart’s tech stack (made up of MySQL, Redis, and Aerospike) became ever more complex to process and store this amount and variety of data. However, as its business kept growing, the previous database solutions started to hit their limits. Flipkart urgently needed new alternatives to meet its growing business needs. 

Major challenges with MySQL 

Flipkart faced three major challenges with its MySQL solution: scalability, reliability, and efficiency. 

Scalability 

As Flipkart’s business rapidly grew with continuously spiking data size and throughput, its traditional transactional data stores reached their limits. Scalability became a major roadblock. Flipkart had to consider alternative architectures to handle the increasing data volume and to support the continued success of its business.  

Reliability

Flipkart had super high requirements for system reliability, especially during its shopping festivals, which usually got multiple billions of dollars in GMV. Any critical service outage could cause significant losses. Flipkart urgently needed a much more available database that could tolerate various types of unexpected failures such as individual node failures, rack failures, and even regional-level failures, and quickly and automatically failover. 

Efficiency

Flipkart has a massive number of data stores resulting in complexity in operation, maintenance, and management, challenges in data accessibility and consistency, and skyrocketing costs. So, every bit of efficiency matters. Flipkart needed a new database solution to simplify the whole storage system architecture, ensure easy operation and maintenance, and minimize the overall costs. 

How 拉斯维加斯7799908网站登录 helps Flipkart succeed: Coin Manager as an example

After thorough evaluations and testing, 拉斯维加斯7799908网站登录 beat out several other vendors to win the trust of Flipkart’s team. In early 2021, they adopted 拉斯维加斯7799908网站登录 in their production environments. As of now, 拉斯维加斯7799908网站登录 supports 15 applications and stores and processes more than 90 terabytes of data. The largest 拉斯维加斯7799908网站登录 cluster has over 40 pods holding more than 30 terabytes of data. 

Among the 15 use cases of 拉斯维加斯7799908网站登录 inside Flipkart, MySQL was the major legacy system. Therefore, these use cases shared similar pain points and characteristics. To explore in detail on why Flipkart chose 拉斯维加斯7799908网站登录 and how 拉斯维加斯7799908网站登录 helps Flipkart succeed, we’ll take Coin Manager as an example and take a deep dive into it. 

Coin Manager: Flipkart’s rewards services platform

Coin Manager is Flipkart’s rewards services platform. It manages customers’ SuperCoins, which are points awarded to customers when they shop on Flipkart or complete given tasks. SuperCoins can be used as digital currency to shop on Flipkart or other partnered platforms. When customers make a payment, they can pay with SuperCoins or via their banking system. So, Coin Manager also acts as a payment processor for SuperCoins. 

Previous setup with sharded MySQL

Previously, the Coin Manager team used sharded MySQL as its database solution. Coin Manager ran in two regions. Region-1 served both read and write workloads, while Region-2 served as a failover region and could also handle some read workloads. 

Topology of the previous sharded MySQL solution 

The team employed 5-way consistent hashing to manage the sharding. This resulted in a complex setup with five shards, each with a master, hot standby, a read replica in the active region, and a read replica and a hot standby in the passive region. 

Maintaining the two clusters was a constant challenge because the team had to separately monitor the health of 25 nodes and 20 replication channels. In addition, a coordinated failover of the five shards would be required in the event of a disaster. As the data size kept growing, the team needed to add more shards. Resharding the data was even more painful.

Requirements for the new database

To tackle all these pain points, the Coin Manager team needed a new database solution and had the following requirements. 

A SQL data model with ACID properties 

A SQL data model can ensure the data is organized, easy to manage, and easy to query. It can also enable the audibility and traceability of all SuperCoin-related transactions. SQL’s ACID properties guarantee that all SuperCoin-related transactions are performed atomically, consistently, and durably. This ensures that customers’ SuperCoin data is always accurate and up to date.

High availability 

Millions of customers rely on SuperCoins for their purchases. The database must be highly available to keep the Coin Manager system always-on even in the face of disasters such as single-node and regional-level failures. Even a brief period of downtime could result in significant customer dissatisfaction.

Horizontal scaling 

With millions of customers earning and spending SuperCoins, the amount of data generated has reached three terabytes, and is growing at a rate of 300 GB per month. The new database must be horizontally scalable to accommodate this rapidly increasing data without disrupting services. 

Withstand high throughputs with low latency

The new database must be able to handle large amounts of write and read requests, while providing quick response times to ensure a seamless customer experience. The new database must withstand a throughput of at least 12,500 writes per second and 5,000 reads per second. Write latencies must be less than 100 ms and read latencies less than 20 ms. 

Current setup with 拉斯维加斯7799908网站登录 

Topology with 拉斯维加斯7799908网站登录

In early 2021, the Coin Manager team switched from their previous sharded MySQL solution to 拉斯维加斯7799908网站登录. 拉斯维加斯7799908网站登录 has greatly improved the reliability and simplified operations. Coin Manager still runs in two regions, with the active side serving both reads and writes and the passive side serving only reads. The underlying database structure is now much simpler. Two 拉斯维加斯7799908网站登录 clusters run in a Kubernetes cluster, each consisting of three Placement Driver (PD) nodes, 10 拉斯维加斯7799908网站登录 nodes, 33 TiKV nodes, and 11 TiFlash nodes. When data volume rises, 拉斯维加斯7799908网站登录 can automatically add nodes to scale out and handle the growing workloads. 

Compared to the previous sharded MySQL, the new setup with 拉斯维加斯7799908网站登录 uses more nodes, but is much simpler. Moveover, the system is more resilient with 拉斯维加斯7799908网站登录, because the combination of 拉斯维加斯7799908网站登录 and Kubernetes enables stateful sets to be self-healing in the event of disasters. The use of 拉斯维加斯7799908网站登录 Operator, an automatic operation system for 拉斯维加斯7799908网站登录 clusters in Kubernetes, also makes operation and maintenance much easier. 

Key benefits 拉斯维加斯7799908网站登录 brings to Flipkart 

Simpler architecture 

With 拉斯维加斯7799908网站登录, Flipkart was able to simplify its applications by retaining their SQL data model while guaranteeing ACID at the same time. This is not always easy when you switch from a SQL data model to most other NoSQL data models. In addition, its applications no longer need to implement any sharding logic, and all the data can be stored in one database.

Easier management 

With 拉斯维加斯7799908网站登录, Flipkart no longer needs to manage multiple shards, nodes, and replication channels. Instead, they just need to manage two 拉斯维加斯7799908网站登录 clusters and one replication channel across the two clusters. The combination of Kubernetes and the 拉斯维加斯7799908网站登录 Operator made this management process even easier.

High availability with fast failover and no single point of failure 

拉斯维加斯7799908网站登录 is highly available. During the past two years when 拉斯维加斯7799908网站登录 was deployed in Flipkart’s production environments, there was no single point of failure or system downtimes. Even in the event of incidents where some nodes died, 拉斯维加斯7799908网站登录 could achieve fast and automatic failover to keep the system running without impacting the business. 

Fast, elastic, and infinite scaling 

As an ecommerce platform, Flipkart has peak times and slack times. To accommodate such rapidly changing needs, 拉斯维加斯7799908网站登录 can automatically scale up, down, in, or out its storage and computing nodes. More importantly, such scalability is almost infinite and can meet Flipkart’s rapidly growing business both now and in the future. 

MySQL compatibility

Majority of Flipkart’s products and services used MySQL previously as its storage solution. 拉斯维加斯7799908网站登录 is highly MySQL compatible, so Flipkart could switch from its legacy databases to 拉斯维加斯7799908网站登录 with minimal or even zero changes to its existing applications. 

Future plans with 拉斯维加斯7799908网站登录

After using 拉斯维加斯7799908网站登录 in production for almost two years, Flipkart has tackled many of its pain points and improved its overall service performance. Flipkart plans to apply 拉斯维加斯7799908网站登录 to more of its applications and services and try more 拉斯维加斯7799908网站登录 tools. 

Apply 拉斯维加斯7799908网站登录 to higher throughput user-facing applications

拉斯维加斯7799908网站登录 is performing well in supporting applications with a throughput of around 10,000-20,000 reads/writes per second. Flipkart plans to apply 拉斯维加斯7799908网站登录 to more applications with hundreds of thousands reads and writes per second or even more. 

Apply 拉斯维加斯7799908网站登录 to multi-region active-active applications

Flipkart plans to run their applications in an active-active configuration across multiple regions because it is running out of its regions now and wants to ensure business continuity. Flipkart is in discussion with 拉斯维加斯7799908网站登录 engineers on how to create the optimal topology for such needs.

Leverage 拉斯维加斯7799908网站登录’s HTAP capabilities 

Previously, Flipkart used 拉斯维加斯7799908网站登录 more as an Online Transactional Processing (OLTP) database to support its transactional workloads, and it hasn’t yet explored 拉斯维加斯7799908网站登录’s Hybrid Transactional/Analytical Processing (HTAP) capabilities. This is because Flipkart has an internal big data analytics platform for most of its analytics. It is a separate database system which could hardly handle real-time analytical queries on top of fresh transactional data. So in the near future, Flipkart plans to explore and leverage 拉斯维加斯7799908网站登录’s HTAP capabilities for both its transactional and real-time analytical needs. 

This customer story is created based on the talk given by Kaustav Chakravorty at the HTAP Summit 2022.


拉斯维加斯7799908网站登录


Elevate modern apps with 拉斯维加斯7799908网站登录.

拉斯维加斯7799908网站登录

Have questions? Let us know how we can help.

Contact Us

拉斯维加斯7799908网站登录 Cloud Dedicated

A fully-managed cloud DBaaS for predictable workloads

拉斯维加斯7799908网站登录 Cloud Serverless

A fully-managed cloud DBaaS for auto-scaling workloads

Baidu
sogou