How Streak built a graph database on Cloud Spanner to wrangle billions of emails

Streak makes a CRM add-on for Gmail, and recently adopted Cloud Spanner to take advantage of its scalability and SQL capabilities to implement a graph data model. Read on to learn about their decision, what they love about the system, and the ways in which it still needs work.

https://cloud.google.com/blog/products/databases/how-streak-built-a-graph-database-on-cloud-spanner-to-wrangle-billions-of-emails

Aurora Serverless MySQL Generally Available

You may have heard of Amazon Aurora, a custom built MySQL and PostgreSQL compatible database born and built in the cloud. You may have also heard of serverless, which allows you to build and run applications and services without thinking about instances. These are two pieces of the growing AWS technology story that we’re really excited to be working on. Last year, at AWS re:Invent we announced a preview of a new capability for Aurora called Aurora Serverless. Today, I’m pleased to announce that Aurora Serverless for Aurora MySQL is generally available. Aurora Serverless is on-demand, auto-scaling, serverless Aurora. You don’t have to think about instances or scaling and you pay only for what you use.

This paradigm is great for applications with unpredictable load or infrequent demand. In production, you can save on costs by adjusting to scale based on actual load, in extremely granular increments – matching your demand curve almost perfectly. In development, you can save on costs by automatically pausing the cluster (scale to zero!) when it’s not in use. I’m excited to show you how this all works so let’s look at how we launch a Serverless Aurora cluster.

https://aws.amazon.com/blogs/aws/aurora-serverless-ga/

Distributed SQLite for Go applications

This repository provides the dqlite Go package, which can be used to replicate a SQLite database across a cluster, using the Raft algorithm.

Design higlights

  • No external processes needed: dqlite is just a Go library, you link it it to your application exactly like you would with SQLite.
  • Replication needs a SQLite patch which is not yet included upstream.
  • The Go Raft package from Hashicorp is used internally for replicating the write-ahead log frames of SQLite across all nodes.

How does it compare to rqlite?

The main differences from rqlite are:

  • Full support for transactions
  • No need for statements to be deterministic (e.g. you can use time())
  • Frame-based replication instead of statement-based replication, this means in dqlite there’s more data flowing between nodes, so expect lower performance. Should not really matter for most use cases.

https://github.com/CanonicalLtd/dqlite

Living Without Atomic Clocks

It’s a fact that the design of CockroachDB is based on Google’s Spanner data storage system. One of the most surprising and inspired facets of Spanner is its use of atomic clocks and GPS clocks to give participating nodes really accurate wall time synchronization. The designers of Spanner call this ‘TrueTime’, and it provides a tight bound on clock offset between any two nodes in the system. TrueTime enables high levels of external consistency. As an open source database based on Spanner, our challenge was in providing similar guarantees of external consistency without atomic clocks.

If someone knows even a little about Spanner, one of the first questions they have is: “You can’t be using atomic clocks if you’re building an open source database; so how the heck does CockroachDB work?”

It’s a very good question.

CockroachDB was designed to work without atomic clocks or GPS clocks. It’s an open source database intended to be run on arbitrary collections of nodes: from physical servers in a corp development cluster to public cloud infrastructure using the flavor-of-the-month virtualization layer. It’d be a showstopper to require an external dependency on specialized hardware for clock synchronization.

So what does CockroachDB do instead? Well, before answering that question, it’ll be helpful to dig a little deeper into why TrueTime was conceived for Spanner.

https://www.cockroachlabs.com/blog/living-without-atomic-clocks/

WHY UBER ENGINEERING SWITCHED FROM POSTGRES TO MYSQL

The early architecture of Uber consisted of a monolithic backend application written in Python that used Postgres for data persistence. Since that time, the architecture of Uber has changed significantly, to a model of microservices and new data platforms. Specifically, in many of the cases where we previously used Postgres, we now use Schemaless, a novel database sharding layer built on top of MySQL. In this article, we’ll explore some of the drawbacks we found with Postgres and explain the decision to build Schemaless and other backend services on top of MySQL.

The Architecture of Postgres

We encountered many Postgres limitations:

  • Inefficient architecture for writes
  • Inefficient data replication
  • Issues with table corruption
  • Poor replica MVCC support
  • Difficulty upgrading to newer releases

We’ll look at all of these limitations through an analysis of Postgres’s representation of table and index data on disk, especially when compared to the way MySQL represents the same data with its InnoDB storage engine. Note that the analysis that we present here is primarily based on our experience with the somewhat old Postgres 9.2 release series. To our knowledge, the internal architecture that we discuss in this article has not changed significantly in newer Postgres releases, and the basic design of the on-disk representation in 9.2 hasn’t changed significantly since at least the Postgres 8.3 release (now nearly 10 years old).

https://eng.uber.com/mysql-migration/

PostgreSQL Exercises

Welcome to PostgreSQL Exercises! This site was born when I noticed that there’s a load of material out there to help people learn about SQL, but not a great deal to make it easy to learn by doing. PGExercises provides a series of questions and explanations built on a single, simple dataset. It’s designed for use as a partner to a good book or Postgres’ excellent documentation.

The exercises on this site range from simple select and where clauses, through joins and case statements, and on to aggregations, window functions, and recursive queries. Most people who aren’t already pros should find something to test themselves with.

For an introduction to the dataset, go to Getting Started, then select an exercise category from the menu and go!

https://pgexercises.com/