Streak makes a CRM add-on for Gmail, and recently adopted Cloud Spanner to take advantage of its scalability and SQL capabilities to implement a graph data model. Read on to learn about their decision, what they love about the system, and the ways in which it still needs work.
You may have heard of Amazon Aurora, a custom built MySQL and PostgreSQL compatible database born and built in the cloud. You may have also heard of serverless, which allows you to build and run applications and services without thinking about instances. These are two pieces of the growing AWS technology story that we’re really excited to be working on. Last year, at AWS re:Invent we announced a preview of a new capability for Aurora called Aurora Serverless. Today, I’m pleased to announce that Aurora Serverless for Aurora MySQL is generally available. Aurora Serverless is on-demand, auto-scaling, serverless Aurora. You don’t have to think about instances or scaling and you pay only for what you use.
This paradigm is great for applications with unpredictable load or infrequent demand. In production, you can save on costs by adjusting to scale based on actual load, in extremely granular increments – matching your demand curve almost perfectly. In development, you can save on costs by automatically pausing the cluster (scale to zero!) when it’s not in use. I’m excited to show you how this all works so let’s look at how we launch a Serverless Aurora cluster.
It’s a fact that the design of CockroachDB is based on Google’s Spanner data storage system. One of the most surprising and inspired facets of Spanner is its use of atomic clocks and GPS clocks to give participating nodes really accurate wall time synchronization. The designers of Spanner call this ‘TrueTime’, and it provides a tight bound on clock offset between any two nodes in the system. TrueTime enables high levels of external consistency. As an open source database based on Spanner, our challenge was in providing similar guarantees of external consistency without atomic clocks.
If someone knows even a little about Spanner, one of the first questions they have is: “You can’t be using atomic clocks if you’re building an open source database; so how the heck does CockroachDB work?”
It’s a very good question.
CockroachDB was designed to work without atomic clocks or GPS clocks. It’s an open source database intended to be run on arbitrary collections of nodes: from physical servers in a corp development cluster to public cloud infrastructure using the flavor-of-the-month virtualization layer. It’d be a showstopper to require an external dependency on specialized hardware for clock synchronization.
So what does CockroachDB do instead? Well, before answering that question, it’ll be helpful to dig a little deeper into why TrueTime was conceived for Spanner.
The early architecture of Uber consisted of a monolithic backend application written in Python that used Postgres for data persistence. Since that time, the architecture of Uber has changed significantly, to a model of microservices and new data platforms. Specifically, in many of the cases where we previously used Postgres, we now use Schemaless, a novel database sharding layer built on top of MySQL. In this article, we’ll explore some of the drawbacks we found with Postgres and explain the decision to build Schemaless and other backend services on top of MySQL.
The Architecture of Postgres
We encountered many Postgres limitations:
Inefficient architecture for writes
Inefficient data replication
Issues with table corruption
Poor replica MVCC support
Difficulty upgrading to newer releases
We’ll look at all of these limitations through an analysis of Postgres’s representation of table and index data on disk, especially when compared to the way MySQL represents the same data with its InnoDB storage engine. Note that the analysis that we present here is primarily based on our experience with the somewhat old Postgres 9.2 release series. To our knowledge, the internal architecture that we discuss in this article has not changed significantly in newer Postgres releases, and the basic design of the on-disk representation in 9.2 hasn’t changed significantly since at least the Postgres 8.3 release (now nearly 10 years old).
Welcome to PostgreSQL Exercises! This site was born when I noticed that there’s a load of material out there to help people learn about SQL, but not a great deal to make it easy to learn by doing. PGExercises provides a series of questions and explanations built on a single, simple dataset. It’s designed for use as a partner to a good book or Postgres’ excellent documentation.
The exercises on this site range from simple select and where clauses, through joins and case statements, and on to aggregations, window functions, and recursive queries. Most people who aren’t already pros should find something to test themselves with.
For an introduction to the dataset, go to Getting Started, then select an exercise category from the menu and go!
Detailed background on rqlite can be found on these blog posts. Note that master represents 2.0 development (which is still in progress), with a new API and Raft consensus module. If you want to work with 1.0 rqlite, you can find it here.
rqlite is a distributed system that provides a replicated SQLite database. rqlite is written in Go and uses Raft to achieve consensus across all the instances of the SQLite databases. rqlite ensures that every change made to the database is made to a quorum of SQLite files, or none at all.
rqlite gives you the functionality of a fault-tolerant, replicated relational database, but with very easy installation, deployment, and operation.
We have all heard about indexes. Yeah, that thing that it’s automatically added to the Primary Key column that enables fast data retrieval and stuff. Sure, but have you ever asked yourself if there are multiple types or implementations of indexes? Or maybe, what type of indexes your favourite RDBMS implements? In this blog post, we will take a step back to the beginning, exploring what indexes are, what is their role, types of indexes, metrics and so on. And all of this in PostgreSQL…