Leveraging ULIDs to create order in unordered datastores

The rise of distributed data stores and the general decomposition of systems into smaller pieces means that coordination between each server, service, or function is less available. In my first applications, unique ID generation meant setting auto_increment=True on a column in the SQL database. Easy, done, no problem. Today, each microservice has its own data source(s) and NoSQL stores are common. Every NoSQL DB is “NoSQL” in its own way, but they usually eschew coordinated and single-writer solutions in the name of reliability/performance/both. You can’t have an auto-increment column without implementing the coordination client-side.

Using numbers as identifiers also creates problems. Auto-incrementing can lead to enumeration-based attacks. Fields can have fixed sizes. These issues can go unrealized until you overflow the uint32 field, and now your logs are a pile of ID conflict errors. Instead of integers, we can use a different kind of fixed-length field and make it non-sequential so that different hosts can generate IDs without a central coordinating point.

UUID’s are an improvement and avoid collisions in distributed settings, but being strictly random you don’t have a way to easily sort them or determine rough order. Segment blogged a while ago about one replacement for UUIDs with the KSUID (K-Sortable Universal ID) but it has limitations and uses a strange 14e8 offset to avoid running out of epoch time in the next 100 years.

Enter the Unique Lexicographically Sortable Identifier (ULID). These are sortable, high-entropy identifiers that we can generate anywhere in our pipeline without coordination and have confidence that there won’t be collisions. A ULID looks like 01E5TZRCM5WZYPB2BH7KMYR5HT, and the first 10 characters are a timestamp, and the next 16 characters are random.

https://www.trek10.com/blog/leveraging-ulids-to-create-order-in-unordered-datastores

The What, Why, and When of Single-Table Design with DynamoDB

Yet data modeling with DynamoDB is tricky for those used to the relational databases that have dominated for the past few decades. There are a number of quirks around data modeling with DynamoDB, but the biggest one is the recommendation from AWS to use a single table for all of your records.

In this post, we’ll do a deep dive on the concepts behind single-table design. You’ll learn:

https://www.alexdebrie.com/posts/dynamodb-single-table/

Internals of Google Cloud Spanner

“I have learned a lot more internal things about Google Cloud Spanner from past two days. I read some of the portions of the Spanner white paper and the deep internal things from the Google Cloud Next event videos from Youtube. I’ll share the video links here, but I want to summarize all the learnings in one place. That’s why I wrote this blog post. A special thanks to Deepti Srivastava(Product Manager for Spanner) who presented the Spanner Deep Dive sessions in the Google Cloud Next Event.”

https://thedataguy.in/internals-of-google-cloud-spanner/

 

Z-Order Indexing for Multifaceted Queries in Amazon DynamoDB

Using Z-order indexing, you can efficiently run range queries on any combination of fields in your schema. Although Amazon DynamoDB doesn’t natively support Z-order indexing, you can implement the functionality entirely from the client side. A single Z-order index can outperform and even replace entire collections of secondary indexes, saving you money and improving your throughput.

https://aws.amazon.com/blogs/database/z-order-indexing-for-multifaceted-queries-in-amazon-dynamodb-part-1/

In a previous AWS Database Blog post, I introduced Z-order indexing, a way in which you can sort your data to efficiently query an Amazon DynamoDB table by using range bounds on multiple attributes. In this post, we explore the process of creating a schema for your index. We look at how to decide which attributes to include in your schema, how your index’s schema impacts query efficiency, and how to work with a variety of data types.

This post builds on concepts that are described in Part 1, so I recommend taking some time to review it before diving in.

https://aws.amazon.com/blogs/database/z-order-indexing-for-multifaceted-queries-in-amazon-dynamodb-part-2/

Make a New Year’s resolution: Follow Amazon DynamoDB best practices

As the new year begins, we encourage you to make a resolution to follow Amazon DynamoDB best practices. Following these best practices can help you maximize performance and minimize throughput costs when working with DynamoDB. Click the following links to learn more about each best practice in the DynamoDB documentation.

https://aws.amazon.com/blogs/database/make-a-new-years-resolution-follow-amazon-dynamodb-best-practices/

Strategies for Working with Message Queues

Message queues like Apache Kafka are a common component of distributed systems. This blog post will look at several different strategies for improving performance when working with message queues.

Model Overview

Kafka consists of topics which have one or more partitions.

Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log. The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition.

With this structured commit log, each consumer follows the same basic steps:

  1. A consumer is assigned a particular topic-partition (either manually or automatically via a consumer group)
  2. The previous offset is read so that the consumer will begin where it last left off
  3. Messages are consumed from Kafka
  4. Messages are processed in some way
  5. The processed message offset is committed back to Kafka

Other types of message queues (like AMQP) have a similar flow – messages are consumed, processed and acknowledged. Generally we rely on idempotent message processing – that is the ability to process the same message twice with no ill effect – and err on the side of only committing if we’re certain we’ve done what we need to. This gives us durability and guarantees that every message will be processed, even if our consumer process crashes.

https://www.doxsey.net/blog/strategies-for-working-with-message-queues