This post is part of the https://medium.com/gett-engineering/before-you-go-go-bf4f861cdec7 series, where we explore the world of Golang, provide tips and insights you should know when writing Go, so you don’t have to learn them the hard way.
Message queues like Apache Kafka are a common component of distributed systems. This blog post will look at several different strategies for improving performance when working with message queues.
Kafka consists of topics which have one or more partitions.
Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log. The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition.
With this structured commit log, each consumer follows the same basic steps:
- A consumer is assigned a particular topic-partition (either manually or automatically via a consumer group)
- The previous offset is read so that the consumer will begin where it last left off
- Messages are consumed from Kafka
- Messages are processed in some way
- The processed message offset is committed back to Kafka
Other types of message queues (like AMQP) have a similar flow – messages are consumed, processed and acknowledged. Generally we rely on idempotent message processing – that is the ability to process the same message twice with no ill effect – and err on the side of only committing if we’re certain we’ve done what we need to. This gives us durability and guarantees that every message will be processed, even if our consumer process crashes.
Though the Bloom filter is a textbook algorithm, it has some significant downsides. A major one is that it needs many data accesses and many hash values to check that an object is part of the set. In short, it is not optimally fast.
Can you do better? Yes. Among other alternatives, Fan et al. introduced Cuckoo filters which use less space and are faster than Bloom filters. While implementing a Bloom filter is a relatively simple exercise, Cuckoo filters require a bit more engineering.
Could we do even better while limiting the code to something you can hold in your head?
It turns out that you can with xor filters. We just published a paper called Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters that will appear in the Journal of Experimental Algorithmics.
Although Go doesn’t have generics, we deserve to have reuseable general algorithms.
iter helps improving Go code in several ways:
- Some simple loops are unlikely to be wrong or inefficient, but calling algorithm instead will make the code more concise and easier to comprehend. Such as AllOf, FindIf, Accumulate.
- Some algorithms are not complicated, but it is not easy to write them correctly. Reusing code makes them easier to reason for correctness. Such as Shuffle, Sample, Partition.
- STL also includes some complicated algorithms that may take hours to make it correct. Implementing it manually is impractical. Such as NthElement, StablePartition, NextPermutation.
- The implementation in the library contains some imperceptible performance optimizations. For instance, MinmaxElement is done by taking two elements at a time. In this way, the overall number of comparisons is significantly reduced.
- Non-intrusive. Instead of introducing new containers,
itertends to reuse existed containers in Go (slice, string, list.List, etc.) and use iterators to adapt them to algorithms.
- Full algorithms (>100). It includes almost all algorithms come before C++17. Check the Full List.
PySnooper is a poor man’s debugger.
You’re trying to figure out why your Python code isn’t doing what you think it should be doing. You’d love to use a full-fledged debugger with breakpoints and watches, but you can’t be bothered to set one up right now.
You want to know which lines are running and which aren’t, and what the values of the local variables are.
Most people would use
PySnooper lets you do the same, except instead of carefully crafting the right
What makes PySnooper stand out from all other code intelligence tools? You can use it in your shitty, sprawling enterprise codebase without having to do any setup. Just slap the decorator on, as shown below, and redirect the output to a dedicated log file by specifying its path as the first argument.
In many use cases, there are processes that need to execute multiple tasks. We build micro-services or server-less functions like AWS Lambda functions to carry out these tasks. Almost all these services are stateless functions and there is need of queues or databases to maintain the state of individual tasks and the process as a whole. Writing code that orchestrates these tasks can be both painful and hard to debug and maintain. It’s not easy to maintain the state of a process in an ecosystem of micro-services and server-less functions.