The rise of distributed data stores and the general decomposition of systems into smaller pieces means that coordination between each server, service, or function is less available. In my first applications, unique ID generation meant setting
auto_increment=True on a column in the SQL database. Easy, done, no problem. Today, each microservice has its own data source(s) and NoSQL stores are common. Every NoSQL DB is “NoSQL” in its own way, but they usually eschew coordinated and single-writer solutions in the name of reliability/performance/both. You can’t have an auto-increment column without implementing the coordination client-side.
Using numbers as identifiers also creates problems. Auto-incrementing can lead to enumeration-based attacks. Fields can have fixed sizes. These issues can go unrealized until you overflow the
uint32 field, and now your logs are a pile of ID conflict errors. Instead of integers, we can use a different kind of fixed-length field and make it non-sequential so that different hosts can generate IDs without a central coordinating point.
UUID’s are an improvement and avoid collisions in distributed settings, but being strictly random you don’t have a way to easily sort them or determine rough order. Segment blogged a while ago about one replacement for UUIDs with the KSUID (K-Sortable Universal ID) but it has limitations and uses a strange
14e8 offset to avoid running out of epoch time in the next 100 years.
Enter the Unique Lexicographically Sortable Identifier (ULID). These are sortable, high-entropy identifiers that we can generate anywhere in our pipeline without coordination and have confidence that there won’t be collisions. A ULID looks like
01E5TZRCM5WZYPB2BH7KMYR5HT, and the first 10 characters are a timestamp, and the next 16 characters are random.
Yet data modeling with DynamoDB is tricky for those used to the relational databases that have dominated for the past few decades. There are a number of quirks around data modeling with DynamoDB, but the biggest one is the recommendation from AWS to use a single table for all of your records.
In this post, we’ll do a deep dive on the concepts behind single-table design. You’ll learn:
“I have learned a lot more internal things about Google Cloud Spanner from past two days. I read some of the portions of the Spanner white paper and the deep internal things from the Google Cloud Next event videos from Youtube. I’ll share the video links here, but I want to summarize all the learnings in one place. That’s why I wrote this blog post. A special thanks to Deepti Srivastava(Product Manager for Spanner) who presented the Spanner Deep Dive sessions in the Google Cloud Next Event.”
Using Z-order indexing, you can efficiently run range queries on any combination of fields in your schema. Although Amazon DynamoDB doesn’t natively support Z-order indexing, you can implement the functionality entirely from the client side. A single Z-order index can outperform and even replace entire collections of secondary indexes, saving you money and improving your throughput.
In a previous AWS Database Blog post, I introduced Z-order indexing, a way in which you can sort your data to efficiently query an Amazon DynamoDB table by using range bounds on multiple attributes. In this post, we explore the process of creating a schema for your index. We look at how to decide which attributes to include in your schema, how your index’s schema impacts query efficiency, and how to work with a variety of data types.
This post builds on concepts that are described in Part 1, so I recommend taking some time to review it before diving in.
A curated list of guides, development tools, and resources for Amazon Quantum Ledger Database (QLDB). This list includes both community created content as well as content created by AWS.
As the new year begins, we encourage you to make a resolution to follow Amazon DynamoDB best practices. Following these best practices can help you maximize performance and minimize throughput costs when working with DynamoDB. Click the following links to learn more about each best practice in the DynamoDB documentation.
Message queues like Apache Kafka are a common component of distributed systems. This blog post will look at several different strategies for improving performance when working with message queues.
Kafka consists of topics which have one or more partitions.
Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log. The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition.
With this structured commit log, each consumer follows the same basic steps:
- A consumer is assigned a particular topic-partition (either manually or automatically via a consumer group)
- The previous offset is read so that the consumer will begin where it last left off
- Messages are consumed from Kafka
- Messages are processed in some way
- The processed message offset is committed back to Kafka
Other types of message queues (like AMQP) have a similar flow – messages are consumed, processed and acknowledged. Generally we rely on idempotent message processing – that is the ability to process the same message twice with no ill effect – and err on the side of only committing if we’re certain we’ve done what we need to. This gives us durability and guarantees that every message will be processed, even if our consumer process crashes.
Modeling your data in the DynamoDB database structure requires a different approach from modeling in traditional relational databases. Alex DeBrie has written a number of applications using DynamoDB and is the creator of DynamoDBGuide.com, a free resource for learning DynamoDB. In this session, we review the key principles of modeling your DynamoDB tables and teach you some practical patterns to use in your data models. Leave this session with steps to follow and principles to guide you as you work with DynamoDB.
Time-series data shows a pattern of change over time. For example, you might have a fleet of Internet of Things (IoT) devices that record environmental data through their sensors, as shown in the following example graph. This data could include temperature, pressure, humidity, and other environmental variables. Because each IoT device tracks these values over regular periods, your backend needs to ingest up to hundreds, thousands, or millions of events every second.
In this blog post, I explain how to optimize Amazon DynamoDB for high-volume, time-series data scenarios. I do this by using a design pattern powered by automation and serverless computing.
More and more, AWS customers want to make their applications available to globally dispersed users by deploying their application in multiple AWS Regions. These global users expect fast application performance.
In this post, I describe how to use Amazon DynamoDB to power the database of a global backend deployed in multiple AWS Regions. I use DynamoDB global tables, which provide a fully managed, multiregion, and multimaster database so that you can deliver low-latency data access to your users no matter where they are located on the globe.
Why use a multiregion architecture?
AWS customers typically want a multiregion architecture for two reasons:
- To provide low latency and improve their app experience.
- To facilitate disaster recovery.