Why Uber Engineering Switched from Postgres to MySQL

The early architecture of Uber consisted of a monolithic backend application written in Python that used Postgres for data persistence. Since that time, the architecture of Uber has changed significantly, to a model of microservices and new data platforms. Specifically, in many of the cases where we previously used Postgres, we now use Schemaless, a novel database sharding layer built on top of MySQL. In this article, we’ll explore some of the drawbacks we found with Postgres and explain the decision to build Schemaless and other backend services on top of MySQL.

 

The Architecture of Postgres

We encountered many Postgres limitations:

  • Inefficient architecture for writes
  • Inefficient data replication
  • Issues with table corruption
  • Poor replica MVCC support
  • Difficulty upgrading to newer releases

https://eng.uber.com/mysql-migration/

 

Why we lost Uber as a user…

On 07/26/2016 01:53 PM, Josh Berkus wrote:
> The write amplification issue, and its correllary in VACUUM, certainly
> continues to plague some users, and doesn’t have any easy solutions.

To explain this in concrete terms, which the blog post does not:

1. Create a small table, but one with enough rows that indexes make
sense (say 50,000 rows).

2. Make this table used in JOINs all over your database.

3. To support these JOINs, index most of the columns in the small table.

4. Now, update that small table 500 times per second.

That’s a recipe for runaway table bloat; VACUUM can’t do much because
there’s always some minutes-old transaction hanging around (and SNAPSHOT
TOO OLD doesn’t really help, we’re talking about minutes here), and
because of all of the indexes HOT isn’t effective. Removing the indexes
is equally painful because it means less efficient JOINs.

The Uber guy is right that InnoDB handles this better as long as you
don’t touch the primary key (primary key updates in InnoDB are really bad).

This is a common problem case we don’t have an answer for yet.



Josh Berkus
Red Hat OSAS
(any opinions are my own)

https://www.postgresql.org/message-id/5797D5A1.5030009%40agliodbs.com

WHY UBER ENGINEERING SWITCHED FROM POSTGRES TO MYSQL

The early architecture of Uber consisted of a monolithic backend application written in Python that used Postgres for data persistence. Since that time, the architecture of Uber has changed significantly, to a model of microservices and new data platforms. Specifically, in many of the cases where we previously used Postgres, we now use Schemaless, a novel database sharding layer built on top of MySQL. In this article, we’ll explore some of the drawbacks we found with Postgres and explain the decision to build Schemaless and other backend services on top of MySQL.

The Architecture of Postgres

We encountered many Postgres limitations:

  • Inefficient architecture for writes
  • Inefficient data replication
  • Issues with table corruption
  • Poor replica MVCC support
  • Difficulty upgrading to newer releases

We’ll look at all of these limitations through an analysis of Postgres’s representation of table and index data on disk, especially when compared to the way MySQL represents the same data with its InnoDB storage engine. Note that the analysis that we present here is primarily based on our experience with the somewhat old Postgres 9.2 release series. To our knowledge, the internal architecture that we discuss in this article has not changed significantly in newer Postgres releases, and the basic design of the on-disk representation in 9.2 hasn’t changed significantly since at least the Postgres 8.3 release (now nearly 10 years old).

https://eng.uber.com/mysql-migration/

PostgreSQL Exercises

Welcome to PostgreSQL Exercises! This site was born when I noticed that there’s a load of material out there to help people learn about SQL, but not a great deal to make it easy to learn by doing. PGExercises provides a series of questions and explanations built on a single, simple dataset. It’s designed for use as a partner to a good book or Postgres’ excellent documentation.

The exercises on this site range from simple select and where clauses, through joins and case statements, and on to aggregations, window functions, and recursive queries. Most people who aren’t already pros should find something to test themselves with.

For an introduction to the dataset, go to Getting Started, then select an exercise category from the menu and go!

https://pgexercises.com/

The beets blog: we’re pretty happy with SQLite & not urgently interested in a fancier DBMS

Every once in a while, someone suggests that beets should use a “real database.” I think this means storing music metadata in PostgreSQL or MySQL as an alternative to our current SQLite database. The idea is that a more complicated DBMS should be faster, especially for huge music libraries.

The pseudo-official position of the beets project is that supporting a new DBMS is probably not worth your time. If you’re interested in performance, please consider helping to optimize our database queries instead.

There are three reasons I’m unenthusiastic about alternative DBMSes: I’m skeptical that they will actually help performance; it’s a clear case of premature optimization; and SQLite is unbeatably convenient.

http://beets.io/blog/sqlite-performance.html

How To Set Up Django with Postgres, Nginx, and Gunicorn on Ubuntu 16.04

Django is a powerful web framework that can help you get your Python application or website off the ground. Django includes a simplified development server for testing your code locally, but for anything even slightly production related, a more secure and powerful web server is required.

In this guide, we will demonstrate how to install and configure some components on Ubuntu 16.04 to support and serve Django applications. We will be setting up a PostgreSQL database instead of using the default SQLite database. We will configure the Gunicorn application server to interface with our applications. We will then set up Nginx to reverse proxy to Gunicorn, giving us access to its security and performance features to serve our apps.

https://www.digitalocean.com/community/tutorials/how-to-set-up-django-with-postgres-nginx-and-gunicorn-on-ubuntu-16-04

PostgreSQL Indexes: First Principles

We have all heard about indexes. Yeah, that thing that it’s automatically added to the Primary Key column that enables fast data retrieval and stuff. Sure, but have you ever asked yourself if there are multiple types or implementations of indexes? Or maybe, what type of indexes your favourite RDBMS implements? In this blog post, we will take a step back to the beginning, exploring what indexes are, what is their role, types of indexes, metrics and so on. And all of this in PostgreSQL…

http://eftimov.net/postgresql-indexes-first-principles