Design patterns for high-volume, time-series data in Amazon DynamoDB

Time-series data shows a pattern of change over time. For example, you might have a fleet of Internet of Things (IoT) devices that record environmental data through their sensors, as shown in the following example graph. This data could include temperature, pressure, humidity, and other environmental variables. Because each IoT device tracks these values over regular periods, your backend needs to ingest up to hundreds, thousands, or millions of events every second.

Graph of sensor data

In this blog post, I explain how to optimize Amazon DynamoDB for high-volume, time-series data scenarios. I do this by using a design pattern powered by automation and serverless computing.


Multi-region serverless backend 

In 2018, I wrote a series of blog posts on building a multi-region, active-active, serverless architecture on AWS [1, 2, 3 and 4]. The solution was built using DynamoDB Global Tables, Lambda, the regional API Gateway feature, and Route 53 routing policies. It worked well as a resiliency pattern and as a disaster recovery (DR) strategy . But there was an issue.

How to use Amazon DynamoDB global tables to power multiregion architectures

More and more, AWS customers want to make their applications available to globally dispersed users by deploying their application in multiple AWS Regions. These global users expect fast application performance.

In this post, I describe how to use Amazon DynamoDB to power the database of a global backend deployed in multiple AWS Regions. I use DynamoDB global tables, which provide a fully managed, multiregion, and multimaster database so that you can deliver low-latency data access to your users no matter where they are located on the globe.

Why use a multiregion architecture?

AWS customers typically want a multiregion architecture for two reasons:

  1. To provide low latency and improve their app experience.
  2. To facilitate disaster recovery.

Anomaly detection on Amazon DynamoDB Streams using the Amazon SageMaker Random Cut Forest algorithm

Have you considered introducing anomaly detection technology to your business? Anomaly detection is a technique used to identify rare items, events, or observations which raise suspicion by differing significantly from the majority of the data you are analyzing.  The applications of anomaly detection are wide-ranging including the detection of abnormal purchases or cyber intrusions in banking, spotting a malignant tumor in an MRI scan, identifying fraudulent insurance claims, finding unusual machine behavior in manufacturing, and even detecting strange patterns in network traffic that could signal an intrusion.

There are many commercial products to do this, but you can easily implement an anomaly detection system by using Amazon SageMaker, AWS Glue, and AWS Lambda. Amazon SageMaker is a fully-managed platform to help you quickly build, train, and deploy machine learning models at any scale. AWS Glue is a fully-managed ETL service that makes it easy for you to prepare your data/model for analytics. AWS Lambda is a well-known a serverless real-time platform. Using these services, your model can be automatically updated with new data, and the new model can be used to alert for anomalies in real time with better accuracy.

In this blog post I’ll describe how you can use AWS Glue to prepare your data and train an anomaly detection model using Amazon SageMaker. For this exercise, I’ll store a sample of the NAB NYC Taxi data in Amazon DynamoDB to be streamed in real time using an AWS Lambda function.

The solution that I describe provides the following benefits:

  • You can make the best use of existing resources for anomaly detection. For example, if you have been using Amazon DynamoDB Streams for disaster recovery (DR) or other purposes, you can use the data in that stream for anomaly detection. In addition, stand-by storage usually has low utilization. The data in low awareness can be used for training data.
  • You can automatically retrain the model with new data on a regular basis with no user intervention.
  • You can make it easy to use the Random Cut Forest built-in Amazon SageMaker algorithm. Amazon SageMaker offers flexible distributed training options that adjust to your specific workflows in a secure and scalable environment.

Amazon DynamoDB On-Demand – No Capacity Planning and Pay-Per-Request Pricing

Just a few years ago, creating a database that could support your business at any scale while providing consistent low latency was a daunting task. That changed for me in 2012 while reading Werner Vogels’ blog post announcing Amazon DynamoDB (it was a few months before I joined AWS). DynamoDB was built on the principles in the original Dynamo paper that Amazon published in 2007. Over the years, lots of new features have been introduced to further simplify how AWS customers use databases. You can now create fully managed, multi-region, multi-master database tables with features such as encryption at rest, point-in-time recovery, in-memory caching, and a 99.99% uptime service level agreement (SLA).

Amazon DynamoDB Transactions

Over the years, customers have used Amazon DynamoDB for lots of different use cases, from building microservices and mobile backends to implementing gaming and Internet of Things (IoT) solutions. For example, Capital One uses DynamoDB to reduce the latency of their mobile applications by moving their mainframe transactions to a serverless architecture. Tinder migrated user data to DynamoDB with zero downtime, to get the scalability they need to support their global user base.

Developers sometimes need to implement business logic that requires multiple, all-or-nothing operations across one or more tables. This requirement can add unnecessary complexity to their implementation. Today, we are making these use cases easier to build on DynamoDB with native support for transactions.