Uber Engineering’s Schemaless storage system powers some of the biggest services at Uber, such as Mezzanine. Schemaless is a scalable and highly available datastore on top ofMySQL¹ clusters. Managing these clusters was fairly easy when we had 16 clusters. These days, we have more than 1,000 clusters containing more than 4,000 database servers, and that requires a different class of tooling.
Initially, all our clusters were managed by Puppet, a lot of ad hoc scripts, and manual operations that couldn’t scale at Uber’s pace. When we began looking for a better way to manage our increasing number of MySQL clusters, we had a few basic requirements:
- Run multiple database processes on each host
- Automate everything
- Have a single entry point to manage and monitor all clusters across all data center
The solution we came up with is a design called Schemadock. We run MySQL in Dockercontainers, which are managed by goal states that define cluster topologies in configuration files. Cluster topologies specify how MySQL clusters should look; for example, that there should be a Cluster A with 3 databases, and which one should be the master. Agents then apply those topologies on the individual databases. A centralized service maintains and monitors the goal state for each instance and reacts to any deviations.
Schemadock has many components, and Docker is a small but significant one. Switching to a more scalable solution has been a momentous effort, and this article explains how Docker helped us get here.