Anomaly detection on Amazon DynamoDB Streams using the Amazon SageMaker Random Cut Forest algorithm

Have you considered introducing anomaly detection technology to your business? Anomaly detection is a technique used to identify rare items, events, or observations which raise suspicion by differing significantly from the majority of the data you are analyzing.  The applications of anomaly detection are wide-ranging including the detection of abnormal purchases or cyber intrusions in banking, spotting a malignant tumor in an MRI scan, identifying fraudulent insurance claims, finding unusual machine behavior in manufacturing, and even detecting strange patterns in network traffic that could signal an intrusion.

There are many commercial products to do this, but you can easily implement an anomaly detection system by using Amazon SageMaker, AWS Glue, and AWS Lambda. Amazon SageMaker is a fully-managed platform to help you quickly build, train, and deploy machine learning models at any scale. AWS Glue is a fully-managed ETL service that makes it easy for you to prepare your data/model for analytics. AWS Lambda is a well-known a serverless real-time platform. Using these services, your model can be automatically updated with new data, and the new model can be used to alert for anomalies in real time with better accuracy.

In this blog post I’ll describe how you can use AWS Glue to prepare your data and train an anomaly detection model using Amazon SageMaker. For this exercise, I’ll store a sample of the NAB NYC Taxi data in Amazon DynamoDB to be streamed in real time using an AWS Lambda function.

The solution that I describe provides the following benefits:

  • You can make the best use of existing resources for anomaly detection. For example, if you have been using Amazon DynamoDB Streams for disaster recovery (DR) or other purposes, you can use the data in that stream for anomaly detection. In addition, stand-by storage usually has low utilization. The data in low awareness can be used for training data.
  • You can automatically retrain the model with new data on a regular basis with no user intervention.
  • You can make it easy to use the Random Cut Forest built-in Amazon SageMaker algorithm. Amazon SageMaker offers flexible distributed training options that adjust to your specific workflows in a secure and scalable environment.

https://aws.amazon.com/pt/blogs/machine-learning/anomaly-detection-on-amazon-dynamodb-streams-using-the-amazon-sagemaker-random-cut-forest-algorithm/

Amazon DynamoDB On-Demand – No Capacity Planning and Pay-Per-Request Pricing

Just a few years ago, creating a database that could support your business at any scale while providing consistent low latency was a daunting task. That changed for me in 2012 while reading Werner Vogels’ blog post announcing Amazon DynamoDB (it was a few months before I joined AWS). DynamoDB was built on the principles in the original Dynamo paper that Amazon published in 2007. Over the years, lots of new features have been introduced to further simplify how AWS customers use databases. You can now create fully managed, multi-region, multi-master database tables with features such as encryption at rest, point-in-time recovery, in-memory caching, and a 99.99% uptime service level agreement (SLA).

https://aws.amazon.com/blogs/aws/amazon-dynamodb-on-demand-no-capacity-planning-and-pay-per-request-pricing/

Amazon DynamoDB Transactions

Over the years, customers have used Amazon DynamoDB for lots of different use cases, from building microservices and mobile backends to implementing gaming and Internet of Things (IoT) solutions. For example, Capital One uses DynamoDB to reduce the latency of their mobile applications by moving their mainframe transactions to a serverless architecture. Tinder migrated user data to DynamoDB with zero downtime, to get the scalability they need to support their global user base.

Developers sometimes need to implement business logic that requires multiple, all-or-nothing operations across one or more tables. This requirement can add unnecessary complexity to their implementation. Today, we are making these use cases easier to build on DynamoDB with native support for transactions.

https://aws.amazon.com/blogs/aws/new-amazon-dynamodb-transactions/

Choosing the Right DynamoDB Partition Key

This blog post will cover important considerations and strategies for choosing the right partition key while migrating from a relational database to DynamoDB. This is an important step in the design  and building of scalable and reliable applications on top of DynamoDB.

What is a partition key?

DynamoDB supports two types of primary keys:

  • Partition key: Also known as a hash key, the partition key is composed of a single attribute. Attributes in DynamoDB are similar in many ways to fields or columns in other database systems.
  • Partition key and sort key: Referred to as a composite primary key or hash-range key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key

https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/

How to create a REST API with pre-written Serverless Components

You might have already heard about our new project, Serverless Components. Our goal was to encapsulate common functionality into so-called “components”, which could then be easily re-used, extended and shared with other developers and other serverless applications.

In this post, I’m going to show you how to compose a fully-fledged, REST API-powered application, all by using several pre-built components from the component registry.

https://serverless.com/blog/how-create-rest-api-serverless-components/

Use Amazon DynamoDB Accelerator (DAX) from AWS Lambda to increase performance while reducing costs

Using Amazon DynamoDB Accelerator (DAX) from AWS Lambda has several benefits for serverless applications that also use Amazon DynamoDB. DAX can improve the response time of your application by dramatically reducing read latency, as compared to using DynamoDB. Using DAX can also lower the cost of DynamoDB by reducing the amount of provisioned read throughput needed for read-heavy applications. For serverless applications, DAX provides an additional benefit: Lower latency results in shorter Lambda execution times, which means lower costs.

Connecting to a DAX cluster from Lambda functions requires some special configuration. In this post, I show an example URL-shortening application based on the AWS Serverless Application Model (AWS SAM). The application uses Amazon API Gateway, Lambda, DynamoDB, DAX, and AWS CloudFormation to demonstrate how to access DAX from Lambda.

https://aws.amazon.com/pt/blogs/database/how-to-increase-performance-while-reducing-costs-by-using-amazon-dynamodb-accelerator-dax-and-aws-lambda/

Build a serverless multi-region, active-active backend solution in an hour

In the previous posts, we explored availability and reliability and the needs and means of building a multi-region, active-active architecture on AWS. In this blog post, I will walk you through the steps needed to build and deploy a serverless multi-region, active-active backend.

This actually blows my mind — since I will be able to explain this in one blog post and you should be able to deploy a fully functional backend in about an hour. In contrast, few years ago it would have required a lot more expertise, work, time and money!

https://read.acloud.guru/building-a-serverless-multi-region-active-active-backend-36f28bed4ecf

Automatically Archive Items to S3 Using DynamoDB Time to Live (TTL) with AWS Lambda and Amazon Kinesis Firehose

Earlier this year, Amazon DynamoDB released Time to Live (TTL) functionality, which automatically deletes expired items from your tables, at no additional cost. TTL eliminates the complexity and cost of scanning tables and deleting items that you don’t want to retain, saving you money on provisioned throughput and storage. One AWS customer, TUNE, purged 85 terabytes of stale data and reduced their costs by over $200K per year.

Today, DynamoDB made TTL better with the release of a new CloudWatch metric for tracking the number of items deleted by TTL, which is also viewable for no additional charge. This new metric helps you monitor the rate of TTL deletions to validate that TTL is working as expected. For example, you could set a CloudWatch alarm to fire if too many or too few automated deletes occur, which might indicate an issue in how you set expiration time stamps for your items.

In this post, I’ll walk through an example of a serverless application using TTL to  automate a common database management task: moving old data from your database into archival storage automatically. Archiving old data helps reduce costs and meet regulatory requirements governing data retention or deletion policies. I’ll show how TTL—combined with DynamoDB Streams, AWS Lambda, and Amazon Kinesis Firehose—facilitates archiving data to a low-cost storage service like Amazon S3, a data warehouse like Amazon Redshift, or to Amazon Elasticsearch Service.

https://aws.amazon.com/blogs/database/automatically-archive-items-to-s3-using-dynamodb-time-to-live-with-aws-lambda-and-amazon-kinesis-firehose/?sc_channel=sm&sc_campaign=DB_Blog&sc_publisher=TWITTER&sc_country=Global&sc_geo=GLOBAL&sc_outcome=awareness&trkCampaign=sm_archivetos3&trk=_TWITTER&sc_content=archivetos3&sc_category=AWS_Lambda,Amazon_DynamoDB&linkId=50979940

Amazon DynamoDB Continuous Backups and Point-In-Time Recovery (PITR)

The Amazon DynamoDB team is back with another useful feature hot on the heels of encryption at rest. At AWS re:Invent 2017 we launched global tables and on-demand backup and restore of your DynamoDB tables and today we’re launching continuous backups with point-in-time recovery (PITR).

You can enable continuous backups with a single click in the AWS Management Console, a simple API call, or with the AWS Command Line Interface (CLI). DynamoDB can back up your data with per-second granularity and restore to any single second from the time PITR was enabled up to the prior 35 days. We built this feature to protect against accidental writes or deletes. If a developer runs a script against production instead of staging or if someone fat-fingers a DeleteItem call, PITR has you covered. We also built it for the scenarios you can’t normally predict. You can still keep your on-demand backups for as long as needed for archival purposes but PITR works as additional insurance against accidental loss of data. Let’s see how this works.

https://aws.amazon.com/blogs/aws/new-amazon-dynamodb-continuous-backups-and-point-in-time-recovery-pitr/

A Decade of Dynamo: Powering the next wave of high-performance, internet-scale applications

Today marks the 10 year anniversary of Amazon’s Dynamo whitepaper, a milestone that made me reflect on how much innovation has occurred in the area of databases over the last decade and a good reminder on why taking a customer obsessed approach to solving hard problems can have lasting impact beyond your original expectations.

“Dynamo: Amazon’s Highly Available Key-value Store” received the ACM SIGOPS 2017 Hall of Fame Award

http://www.allthingsdistributed.com/2017/10/a-decade-of-dynamo.html