Amazon ElastiCache makes it easy for you to set up a fully managed in-memory data store and cache with Redis or Memcached. Today we’re pleased to launch compatibility with Redis 4.0 in ElastiCache. You can now launch Redis 4.0 compatible ElastiCache nodes or clusters, in all commercial AWS regions. ElastiCache Redis clusters can scale to terabytes of memory and millions of reads / writes per second to serve the most demanding needs of games, IoT devices, financial applications, and web applications.
For our products, like the trivago hotel search, we are using Redis a lot. The use cases vary: Caching, temporary storage of data before moving those into another storage or a typical database for hotel meta data including persistence.
Both JSON and Redis need no introduction; the former is the standard data interchange format between modern applications, whereas the latter is ubiquitous wherever performant data management is needed by them. That being the case, I was shocked when a couple of years ago I learned that the two don’t get along.
Redis isn’t a one-trick pony–it is, in fact, quite the opposite. Unlike general purpose one-size-fits-all databases, Redis (a.k.a the “Swiss Army Knife of Databases”, “Super Glue of Microservices” and “Execution context of Functions-as-a-Service”) provides specialized tools for specific tasks. Developers use these tools, which are exposed as abstract data structures and their accompanying operations, to model optimal solutions for problems. And that is exactly the reason why using Redis for managing JSON data is unnatural.
Fact: despite its multitude of core data structures, Redis has none that fit the requirements of a JSON value. Sure, you can work around that by using other data types: Strings are great for storing raw serialized JSON, and you can represent flat JSON objects with Hashes. But these workaround patterns impose limitations that make them useful only in a handful of use cases, and even then the experience leaves an un-Redis-ish aftertaste. Their awkwardness clashes sharply with the simplicity and elegance of using Redis normally.
But all that changed during the last year after Salvatore Sanfilippo’s @antirez visit to the Tel Aviv office, and with Redis modules becoming a reality. Suddenly the sky wasn’t the limit anymore. Now that modules let anyone do anything, it turned out that I could be that particular anyone. Picking up on C development after more than a two decades hiatus proved to be less of a nightmare than I had anticipated, and with Dvir Volk’s @dvirsky loving guidance we birthed ReJSON.
We are excited to announce that Amazon ElastiCache now supports enhanced Redis Backup and Restore with Cluster Resizing. In October 2016, we launched support for Redis Cluster with Redis 3.2.4. In addition to scaling your Redis workloads across up to 15 shards with 3.5TiB of data, it also allowed creating cluster-level backups, which contain snapshots of each of the cluster’s shards. With this launch, we are adding the capability to restore a backup into a Redis Cluster with a different number of shards and slot distribution, allowing you to resize your Redis workload. ElastiCache will parse the Redis key space across the backup’s individual snapshots, and redistribute the keys in the new Cluster according to the requested number of shards and hash slots. Your new cluster can be either larger or smaller in size, as long as the data fits in the selected configuration.
Enhanced Backup and Restore with Cluster Resizing also provides an easy migration path to a managed Redis Cluster experience on ElastiCache. If you are running self-managed Redis on EC2, you can take RDB snapshots or your existing workloads (both Redis Cluster and single-shard Redis) and store them in S3. Then simply provide them as input for creating a sharded Redis Cluster on ElastiCache, and the desired number of shards. ElastiCache will do the rest.
Historically, we have used Redis in two ways at GitHub:
We used it as an LRU cache to conveniently store the results of expensive computations over data originally persisted in Git repositories or MySQL. We call this transient Redis.
We also enabled persistence, which gave us durability guarantees over data that was not stored anywhere else. We used it to store a wide range of values: from sparse data with high read/write ratios, like configuration settings, counters, or quality metrics, to very dynamic information powering core features like spam analysis. We call this persistent Redis.
Recently we made the decision to disable persistence in Redis and stop using it as a source of truth for our data. The main motivations behind this choice were to:
- Reduce the operational cost of our persistence infrastructure by removing some of its complexity.
- Take advantage of our expertise operating MySQL.
- Gain some extra performance, by eliminating the I/O latency during the process of writing big changes on the server state to disk.
Transitioning all that information transparently involved planning and coordination. For each problem domain using persistent Redis, we considered the volume of operations, the structure of the data, and the different access patterns to predict the impact on our current MySQL capacity, and the need for provisioning new hardware.
For the majority of callsites, we replaced persistent Redis with
GitHub::KV, a MySQL key/value store of our own built atop InnoDB, with features like key expiration. We were able to use
GitHub::KV almost identically as we used Redis: from trending repositories and users for the explore page, to rate limiting to spammy user detection.
The Geo API has been around for a while, appearing in the Redis unstable branch about ten months ago and that was, in turn, based on work from 2014. There’s a bit of history in that development process, which being practical folk we’ll skip past and go straight to the stuff that makes your development day better.
At its simplest, the GEO API for Redis reduces longitude/latitude down into a geohash. Geohash is a technique developed in 2008 to represent locations with short string codes. The Geohash of a particular location, say Big Ben in London, would come out as “gcpuvpmm3f0” which is easier to pass around than “latitude 51.500 longitude -0.12455”. The longer the string, the more precise the geohash code.
That encoding into a string is good for humans and URLs but it isn’t particularly space efficient. The good news is geohashes can be encoded as binary and using 52 bits, a geohash gets down to 0.6 meter accuracy which is good enough for most uses. A 52-bit value which just happens to be able to be a small-enough integer to live in a Redis floating-point double safely and that’s what the Geo API works with behind the scenes.