A Tale of BFS: Going Parallel

Great talk from Egon Elbre at https://twitter.com/gopherconeu 2020.

Breadth First Search is one of those fundamental graphs algorithms and also is foundation to many other problems. The basic idea is to move through a graph starting from a source node and visit everything one layer at a time.

https://egonelbre.com/a-tale-of-bfs/

Containers the hard way: Gocker: A mini Docker written in Go

They are popular and they are misunderstood. Containers have become the default way applications are packaged and run on servers, initially popularized by Docker. Now, Docker itself is misunderstood. It is the name of a company and a command (a suite of commands, rather) that allow you to manage containers (create, run, delete, network) easily. Containers themselves however, are created from a set of operating system primitives. In this article, we shall concern ourselves with containers on the Linux operating system and simply act as though containers on Windows do not exist at all.

There is no single system call under Linux that creates containers. They are a loose construct made by utilizing Linux namespaces and control groups or cgroups…


https://unixism.net/2020/06/containers-the-hard-way-gocker-a-mini-docker-written-in-go/

Riding the Tiger: Lessons Learned Implementing Istio

Recently I (along with a few others much smarter than me) had occasion to implement a ‘real’ production system with Istio, running on a managed cloud-provided Kubernetes service.

“My next ulcer will be called ‘istio'” by Ian Miell

Istio has a reputation for being difficult to build with and administer, but I haven’t read many war stories about trying to make it work, so I thought it might be useful to actually write about what it’s like in the trenches for a ‘typical’ team trying to implement this stuff. The intention is very much not to bury Istio, but to praise it (it does so much that is useful/needed for ‘real’ Kubernetes clusters – skip to the end if impatient) while warning those about to step into the breach what comes if you’re not prepared…

A Mathematical Explanation of Naive Bayes in 5 Minutes

Photo by Courtney Cook on Unsplash

Naive Bayes. What may seem like a very confusing algorithm is actually one of the simplest algorithms once understood. Part of why it’s so simple to understand and implement is because of the assumptions that it inherently makes. However, that’s not to say that it’s a poor algorithm despite the strong assumptions that it holds — in fact, Naive Bayes is widely used in the data science world and has a lot of real-life applications.

In this article, we’ll look at what Naive Bayes is, how it works with an example to make it easy to understand, the different types of Naive Bayes, the pros and cons, and some real-life applications of it…

https://towardsdatascience.com/a-mathematical-explanation-of-naive-bayes-in-5-minutes-44adebcdb5f8

Leveraging ULIDs to create order in unordered datastores

The rise of distributed data stores and the general decomposition of systems into smaller pieces means that coordination between each server, service, or function is less available. In my first applications, unique ID generation meant setting auto_increment=True on a column in the SQL database. Easy, done, no problem. Today, each microservice has its own data source(s) and NoSQL stores are common. Every NoSQL DB is “NoSQL” in its own way, but they usually eschew coordinated and single-writer solutions in the name of reliability/performance/both. You can’t have an auto-increment column without implementing the coordination client-side.

Using numbers as identifiers also creates problems. Auto-incrementing can lead to enumeration-based attacks. Fields can have fixed sizes. These issues can go unrealized until you overflow the uint32 field, and now your logs are a pile of ID conflict errors. Instead of integers, we can use a different kind of fixed-length field and make it non-sequential so that different hosts can generate IDs without a central coordinating point.

UUID’s are an improvement and avoid collisions in distributed settings, but being strictly random you don’t have a way to easily sort them or determine rough order. Segment blogged a while ago about one replacement for UUIDs with the KSUID (K-Sortable Universal ID) but it has limitations and uses a strange 14e8 offset to avoid running out of epoch time in the next 100 years.

Enter the Unique Lexicographically Sortable Identifier (ULID). These are sortable, high-entropy identifiers that we can generate anywhere in our pipeline without coordination and have confidence that there won’t be collisions. A ULID looks like 01E5TZRCM5WZYPB2BH7KMYR5HT, and the first 10 characters are a timestamp, and the next 16 characters are random.

https://www.trek10.com/blog/leveraging-ulids-to-create-order-in-unordered-datastores