You may have heard of Amazon Aurora, a custom built MySQL and PostgreSQL compatible database born and built in the cloud. You may have also heard of serverless, which allows you to build and run applications and services without thinking about instances. These are two pieces of the growing AWS technology story that we’re really excited to be working on. Last year, at AWS re:Invent we announced a preview of a new capability for Aurora called Aurora Serverless. Today, I’m pleased to announce that Aurora Serverless for Aurora MySQL is generally available. Aurora Serverless is on-demand, auto-scaling, serverless Aurora. You don’t have to think about instances or scaling and you pay only for what you use.
This paradigm is great for applications with unpredictable load or infrequent demand. In production, you can save on costs by adjusting to scale based on actual load, in extremely granular increments – matching your demand curve almost perfectly. In development, you can save on costs by automatically pausing the cluster (scale to zero!) when it’s not in use. I’m excited to show you how this all works so let’s look at how we launch a Serverless Aurora cluster.
The early architecture of Uber consisted of a monolithic backend application written in Python that used Postgres for data persistence. Since that time, the architecture of Uber has changed significantly, to a model of microservices and new data platforms. Specifically, in many of the cases where we previously used Postgres, we now use Schemaless, a novel database sharding layer built on top of MySQL. In this article, we’ll explore some of the drawbacks we found with Postgres and explain the decision to build Schemaless and other backend services on top of MySQL.
The Architecture of Postgres
We encountered many Postgres limitations:
- Inefficient architecture for writes
- Inefficient data replication
- Issues with table corruption
- Poor replica MVCC support
- Difficulty upgrading to newer releases
Since the purpose of a database system is to store data, there is close relationship with the filesystem. As MySQL consultants, we always look at the filesystems for performance tuning opportunities. The most common choices in term of filesystems are XFS and EXT4, on Linux it is exceptional to encounter another filesystem. Both XFS and EXT4 have pros and cons but their behaviors are well known and they perform well. They perform well but they are not without shortcomings.
Over the years, we have developed a bunch of tools and techniques to overcome these shortcomings. For example, since they don’t allow a consistent view of the filesystem, we wrote tools like Xtrabackup to backup a live MySQL database. Another example is the InnoDB double write buffer. The InnoDB double write buffer is required only because neither XFS nor EXT4 is transactional. There is one filesystem which offers nearly all the features we need, ZFS. ZFS is arguably the most advanced filesystem available on Linux. Maybe it is time to reconsider the use of ZFS with MySQL.
ZFS on Linux or ZoL (from the OpenZFS project), has been around for quite a long time now. I first started using ZoL back in 2012, before it was GA (general availability), in order to solve a nearly impossible challenge to backup a large database (~400 GB) with a mix of InnoDB and MyISAM tables. Yes, ZFS allows that very easily, in just a few seconds. As of 2017, ZoL has been GA for more than 3 years and most of the issues that affected it in the early days have been fixed. ZFS is also GA in FreeBSD, illumos, OmniOS and many others.
This post will hopefully be the first of many posts, devoted to the use of ZFS with MySQL. The goal here is not to blindly push for ZFS but to see when ZFS can help solve real problems. We will first examine ZFS and try to draw parallels with the architecture of MySQL. This will help us to better understand how ZFS works and behaves. Future posts will be devoted to more specific topics like performance, PXC, backups, compression, database operations, bad and poor use cases and sample configurations.
Historically, we have used Redis in two ways at GitHub:
We used it as an LRU cache to conveniently store the results of expensive computations over data originally persisted in Git repositories or MySQL. We call this transient Redis.
We also enabled persistence, which gave us durability guarantees over data that was not stored anywhere else. We used it to store a wide range of values: from sparse data with high read/write ratios, like configuration settings, counters, or quality metrics, to very dynamic information powering core features like spam analysis. We call this persistent Redis.
Recently we made the decision to disable persistence in Redis and stop using it as a source of truth for our data. The main motivations behind this choice were to:
- Reduce the operational cost of our persistence infrastructure by removing some of its complexity.
- Take advantage of our expertise operating MySQL.
- Gain some extra performance, by eliminating the I/O latency during the process of writing big changes on the server state to disk.
Transitioning all that information transparently involved planning and coordination. For each problem domain using persistent Redis, we considered the volume of operations, the structure of the data, and the different access patterns to predict the impact on our current MySQL capacity, and the need for provisioning new hardware.
For the majority of callsites, we replaced persistent Redis with
GitHub::KV, a MySQL key/value store of our own built atop InnoDB, with features like key expiration. We were able to use
GitHub::KV almost identically as we used Redis: from trending repositories and users for the explore page, to rate limiting to spammy user detection.
This talk is about all the things that store data at Booking.com which are not MySQL.
At GitHub we use MySQL as our main datastore. While repository data lies in
git, metadata is stored in MySQL. This includes Issues, Pull Requests, Comments etc. We also auth against MySQL via a custom git proxy (babeld). To be able to serve under the high load GitHub operates at, we use MySQL replication to scale out read load.
We have different clusters to provide with different types of services, but the single-writer-multiple-readers design applies to them all. Depending on growth of traffic, on application demand, on operational tasks or other constraints, we take replicas in or out of our pools. Depending on workloads some replicas may lag more than others.
Displaying up-to-date data is important. We have tooling that helps us ensure we keep replication lag at a minimum, and typically it doesn’t exceed
1 second. However sometimes lags do happen, and when they do, we want to put aside those lagging replicas, let them catch their breath, and avoid sending traffic their way until they are caught up.
We set out to create a self-managing topology that will exclude lagging replicas automatically, handle disasters gracefully, and yet allow for complete human control and visibility.