Message queues like Apache Kafka are a common component of distributed systems. This blog post will look at several different strategies for improving performance when working with message queues.
Kafka consists of topics which have one or more partitions.
Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log. The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition.
With this structured commit log, each consumer follows the same basic steps:
- A consumer is assigned a particular topic-partition (either manually or automatically via a consumer group)
- The previous offset is read so that the consumer will begin where it last left off
- Messages are consumed from Kafka
- Messages are processed in some way
- The processed message offset is committed back to Kafka
Other types of message queues (like AMQP) have a similar flow – messages are consumed, processed and acknowledged. Generally we rely on idempotent message processing – that is the ability to process the same message twice with no ill effect – and err on the side of only committing if we’re certain we’ve done what we need to. This gives us durability and guarantees that every message will be processed, even if our consumer process crashes.