A Tale of BFS: Going Parallel

Great talk from Egon Elbre at https://twitter.com/gopherconeu 2020.

Breadth First Search is one of those fundamental graphs algorithms and also is foundation to many other problems. The basic idea is to move through a graph starting from a source node and visit everything one layer at a time.


Leveraging ULIDs to create order in unordered datastores

The rise of distributed data stores and the general decomposition of systems into smaller pieces means that coordination between each server, service, or function is less available. In my first applications, unique ID generation meant setting auto_increment=True on a column in the SQL database. Easy, done, no problem. Today, each microservice has its own data source(s) and NoSQL stores are common. Every NoSQL DB is “NoSQL” in its own way, but they usually eschew coordinated and single-writer solutions in the name of reliability/performance/both. You can’t have an auto-increment column without implementing the coordination client-side.

Using numbers as identifiers also creates problems. Auto-incrementing can lead to enumeration-based attacks. Fields can have fixed sizes. These issues can go unrealized until you overflow the uint32 field, and now your logs are a pile of ID conflict errors. Instead of integers, we can use a different kind of fixed-length field and make it non-sequential so that different hosts can generate IDs without a central coordinating point.

UUID’s are an improvement and avoid collisions in distributed settings, but being strictly random you don’t have a way to easily sort them or determine rough order. Segment blogged a while ago about one replacement for UUIDs with the KSUID (K-Sortable Universal ID) but it has limitations and uses a strange 14e8 offset to avoid running out of epoch time in the next 100 years.

Enter the Unique Lexicographically Sortable Identifier (ULID). These are sortable, high-entropy identifiers that we can generate anywhere in our pipeline without coordination and have confidence that there won’t be collisions. A ULID looks like 01E5TZRCM5WZYPB2BH7KMYR5HT, and the first 10 characters are a timestamp, and the next 16 characters are random.


Advanced Go Concurrency

f you’ve used Go for a while you’re probably aware of some of the basic Go concurrency primitives:

  • The go keyword for spawning goroutines
  • Channels, for communicating between goroutines
  • The context package for propagating cancellation
  • The sync and sync/atomic packages for lower-level primitives such as mutexes and atomic memory access

These language features and packages combine to provide a very rich set of tools for building concurrent applications. What you might not have discovered yet is a set of higher-level concurrency primitives available in the “extended standard library” available at golang.org/x/sync. We’ll be taking a look at these in this article.


Go memory ballast: How I learnt to stop worrying and love the heap

Go memory ballast: How I learned to stop worrying and love the heap

I’m a big fan of small code changes that can have large impact. This may seem like an obvious thing to state, but let me explain:

  1. These type of changes often involve diving into and understanding things one is not familiar with.
  2. Even with the most well factored code, there is a maintenance cost to each optimization you add, and it’s usually (although not always) pretty linear with the amount of lines of code you end up adding/changing.

We recently rolled out a small change that reduced the CPU utilization of our API frontend servers at Twitch by ~30% and reduced overall 99th percentile API latency during peak load by ~45%.

This blog post is about the change, the process of finding it and explaining how it works.


Building a BitTorrent client from the ground up in Go

BitTorrent is a protocol for downloading and distributing files across the Internet. In contrast with the traditional client/server relationship, in which downloaders connect to a central server (for example: watching a movie on Netflix, or loading the web page you’re reading now), participants in the BitTorrent network, called peers, download pieces of files from each other—this is what makes it a peer-to-peer protocol. We’ll investigate how this works, and build our own client that can find peers and exchange data between them.

diagram showing the difference between client/server (all clients connecting to one server) and peer-to-peer (peers connecting to each other) relationships

The protocol evolved organically over the past 20 years, and various people and organizations added extensions for features like encryption, private torrents, and new ways of finding peers. We’ll be implementing the original spec from 2001 to keep this a weekend-sized project.


Over 150 of the Best Machine Learning, NLP, and Python Tutorials

While machine learning has a rich history dating back to 1959, the field is evolving at an unprecedented rate. In a recent article, I discussed why the broader artificial intelligence field is booming and likely will for some time to come. Those interested in learning ML may find it daunting to get started.