Redis 4.0 Compatibility in Amazon ElastiCache

Amazon ElastiCache makes it easy for you to set up a fully managed in-memory data store and cache with Redis or Memcached. Today we’re pleased to launch compatibility with Redis 4.0 in ElastiCache. You can now launch Redis 4.0 compatible ElastiCache nodes or clusters, in all commercial AWS regions. ElastiCache Redis clusters can scale to terabytes of memory and millions of reads / writes per second to serve the most demanding needs of games, IoT devices, financial applications, and web applications.

Redis as a JSON store

tl;dr a Redis module that provides native JSON capabilities – get it from the GitHub repository or read the docs online.

Both JSON and Redis need no introduction; the former is the standard data interchange format between modern applications, whereas the latter is ubiquitous wherever performant data management is needed by them. That being the case, I was shocked when a couple of years ago I learned that the two don’t get along.

Redis isn’t a one-trick pony–it is, in fact, quite the opposite. Unlike general purpose one-size-fits-all databases, Redis (a.k.a the “Swiss Army Knife of Databases”, “Super Glue of Microservices” and “Execution context of Functions-as-a-Service”) provides specialized tools for specific tasks. Developers use these tools, which are exposed as abstract data structures and their accompanying operations, to model optimal solutions for problems. And that is exactly the reason why using Redis for managing JSON data is unnatural.

Fact: despite its multitude of core data structures, Redis has none that fit the requirements of a JSON value. Sure, you can work around that by using other data types: Strings are great for storing raw serialized JSON, and you can represent flat JSON objects with Hashes. But these workaround patterns impose limitations that make them useful only in a handful of use cases, and even then the experience leaves an un-Redis-ish aftertaste. Their awkwardness clashes sharply with the simplicity and elegance of using Redis normally.

But all that changed during the last year after Salvatore Sanfilippo’s @antirez visit to the Tel Aviv office, and with Redis modules becoming a reality. Suddenly the sky wasn’t the limit anymore. Now that modules let anyone do anything, it turned out that I could be that particular anyone. Picking up on C development after more than a two decades hiatus proved to be less of a nightmare than I had anticipated, and with Dvir Volk’s @dvirsky loving guidance we birthed ReJSON.

Redis as a JSON store

Amazon ElastiCache Launches Enhanced Redis Backup and Restore with Cluster Resizing

We are excited to announce that Amazon ElastiCache now supports enhanced Redis Backup and Restore with Cluster Resizing. In October 2016, we launched support for Redis Cluster with Redis 3.2.4. In addition to scaling your Redis workloads across up to 15 shards with 3.5TiB of data, it also allowed creating cluster-level backups, which contain snapshots of each of the cluster’s shards. With this launch, we are adding the capability to restore a backup into a Redis Cluster with a different number of shards and slot distribution, allowing you to resize your Redis workload. ElastiCache will parse the Redis key space across the backup’s individual snapshots, and redistribute the keys in the new Cluster according to the requested number of shards and hash slots. Your new cluster can be either larger or smaller in size, as long as the data fits in the selected configuration.
Enhanced Backup and Restore with Cluster Resizing also provides an easy migration path to a managed Redis Cluster experience on ElastiCache. If you are running self-managed Redis on EC2, you can take RDB snapshots or your existing workloads (both Redis Cluster and single-shard Redis) and store them in S3. Then simply provide them as input for creating a sharded Redis Cluster on ElastiCache, and the desired number of shards. ElastiCache will do the rest.

Moving persistent data out of Redis

Historically, we have used Redis in two ways at GitHub:

We used it as an LRU cache to conveniently store the results of expensive computations over data originally persisted in Git repositories or MySQL. We call this transient Redis.

We also enabled persistence, which gave us durability guarantees over data that was not stored anywhere else. We used it to store a wide range of values: from sparse data with high read/write ratios, like configuration settings, counters, or quality metrics, to very dynamic information powering core features like spam analysis. We call this persistent Redis.

Recently we made the decision to disable persistence in Redis and stop using it as a source of truth for our data. The main motivations behind this choice were to:

  • Reduce the operational cost of our persistence infrastructure by removing some of its complexity.
  • Take advantage of our expertise operating MySQL.
  • Gain some extra performance, by eliminating the I/O latency during the process of writing big changes on the server state to disk.

Transitioning all that information transparently involved planning and coordination. For each problem domain using persistent Redis, we considered the volume of operations, the structure of the data, and the different access patterns to predict the impact on our current MySQL capacity, and the need for provisioning new hardware.

For the majority of callsites, we replaced persistent Redis with GitHub::KV, a MySQL key/value store of our own built atop InnoDB, with features like key expiration. We were able to use GitHub::KV almost identically as we used Redis: from trending repositories and users for the explore page, to rate limiting to spammy user detection.

A Quick Guide to Redis 3.2’s Geo Support

The Geo API has been around for a while, appearing in the Redis unstable branch about ten months ago and that was, in turn, based on work from 2014. There’s a bit of history in that development process, which being practical folk we’ll skip past and go straight to the stuff that makes your development day better.

At its simplest, the GEO API for Redis reduces longitude/latitude down into a geohash. Geohash is a technique developed in 2008 to represent locations with short string codes. The Geohash of a particular location, say Big Ben in London, would come out as “gcpuvpmm3f0” which is easier to pass around than “latitude 51.500 longitude -0.12455”. The longer the string, the more precise the geohash code.

That encoding into a string is good for humans and URLs but it isn’t particularly space efficient. The good news is geohashes can be encoded as binary and using 52 bits, a geohash gets down to 0.6 meter accuracy which is good enough for most uses. A 52-bit value which just happens to be able to be a small-enough integer to live in a Redis floating-point double safely and that’s what the Geo API works with behind the scenes.

Redis Modules: an introduction to the API

The modules documentation is composed of the following files:

  • (this file). An overview about Redis Modules system and API. It’s a good idea to start your reading here.
  • is generated from module.c top comments of RedisMoule functions. It is a good reference in order to understand how each function works.
  • covers the implementation of native data types into modules.
  • shows how to write blocking commands that will not reply immediately, but will block the client, without blocking the Redis server, and will provide a reply whenever will be possible.

Redis modules make possible to extend Redis functionality using external modules, implementing new Redis commands at a speed and with features similar to what can be done inside the core itself.

Redis modules are dynamic libraries, that can be loaded into Redis at startup or using the MODULE LOAD command. Redis exports a C API, in the form of a single C header file called redismodule.h. Modules are meant to be written in C, however it will be possible to use C++ or other languages that have C binding functionalities.

Modules are designed in order to be loaded into different versions of Redis, so a given module does not need to be designed, or recompiled, in order to run with a specific version of Redis. For this reason, the module will register to the Redis core using a specific API version. The current API version is “1”.

This document is about an alpha version of Redis modules. API, functionalities and other details may change in the future.

Distributed delay queues based on Dynomite

Netflix’s Content Platform Engineering runs a number of business processes which are driven by asynchronous orchestration of micro-services based tasks, and queues form an integral part of the orchestration layer amongst these services.
Few examples of these processes are:
  • IMF based content ingest from our partners
  • Process of setting up new titles within Netflix
  • Content Ingest, encode and deployment to CDN
Traditionally, we have been using a Cassandra based queue recipe along with Zookeeper for distributed locks, since Cassandra is the de facto storage engine at Netflix. Using Cassandra for queue like data structure is a known anti-pattern, also using a global lock on queue while polling, limits the amount of concurrency on the consumer side as the lock ensures only one consumer can poll from the queue at a time.  This can be addressed a bit by sharding the queue but the concurrency is still limited within the shard.  As we started to build out a new orchestration engine, we looked at Dynomite for handling the task queues.
We wanted the following in the queue recipe:
  1. Distributed
  2. No external locks (e.g. Zookeeper locks)
  3. Highly concurrent
  4. At-least-once delivery semantics
  5. No strict FIFO
  6. Delayed queue (message is not taken out of the queue until some time in the future)
  7. Priorities within the shard
The queue recipe described here is used to build a message broker server that exposes various operations (push, poll, ack etc.) via REST endpoints and can potentially be exposed by other transports (e.g. gRPC).  Today, we are open sourcing the queue recipe.