In this talk Ben Darnell, the CTO and Co-Founder of Cockroach Labs, discusses the decision to utilize Go in CockroachDB. Ben shares how CockroachDB optimized its memory usage to mitigate issues related to garbage collection and improved its use of channels to avoid deadlocks. Ben also shares creative techniques to integrate non-Go dependencies into the Go build process.
Garbage collection in Go can cause an application to pause which is a concerning issue, but Go also makes a lot of manual tweaks available that allow contol of what actually ends up on top of the garbage heap. Here are two of the optimizations made by CockroachDB to mitigate garbage collection issues:
By vitrue of these two practices (which you can see examples of in the video) CockroachDB sees in Go’s benchmarking tools that no new allocations are done per iteration. Everything is allocated up front and cached.
“This isn’t my first rodeo with large datasets. The authentication and product management database that I have designed for the largest UK public Wi-Fi provider had impressive volumes too. We were tracking authentication for millions of devices daily. However, that project had a funding that allowed us to pick any hardware, any supporting services and hire any DBAs to assist with replication/data warehousing/troubleshooting. Furthermore, all analytics queries/reporting were done off logical replicas and there were multiple sysadmins that looked after the supporting infrastructure. Whereas this was a venture of my own, with limited funding and 20x the volume.
This is not to say that if we did have loadsamoney we would have spent it on purchasing top-of-the-line hardware, flashy monitoring systems or DBAs (Okay, maybe having a dedicated DBA would have been nice). Over many years of consulting I have developed a view that the root of all evil lies in the unnecessarily complex data processing pipeline. You don’t need a message queue for ETL and you don’t need an application-layer cache for database queries. More often than not, these are workarounds for the underlying database issues (e.g. latency, poor indexing strategy) that create more issues down the line. In ideal scenario, you want to have all data contained within a single database and all data loading operations abstracted into atomic transactions. My goal was not to repeat these mistakes.
As you have already guessed, our PostgreSQL database became the central piece of the business (aptly called ‘mother’, although my co-founder insists that me calling various infrastructure components ‘mother’, ‘mothership’, ‘motherland’, etc is worrying). We don’t have a standalone message queue service, cache service or replicas for data warehousing. Instead of maintaining the supporting infrastructure, I have dedicated my efforts to eliminating any bottlenecks by minimizing latency, provisioning the most suitable hardware, and carefully planning the database schema. What we have is an easy to scale infrastructure with a single database and many data processing agents. I love the simplicity of it — if something breaks, we can pin point and fix the issue within minutes. However, a lot of mistakes were done along the way — this articles summarizes some of them…”
Breadth First Search is one of those fundamental graphs algorithms and also is foundation to many other problems. The basic idea is to move through a graph starting from a source node and visit everything one layer at a time.
They are popular and they are misunderstood. Containers have become the default way applications are packaged and run on servers, initially popularized by Docker. Now, Docker itself is misunderstood. It is the name of a company and a command (a suite of commands, rather) that allow you to manage containers (create, run, delete, network) easily. Containers themselves however, are created from a set of operating system primitives. In this article, we shall concern ourselves with containers on the Linux operating system and simply act as though containers on Windows do not exist at all.
There is no single system call under Linux that creates containers. They are a loose construct made by utilizing Linux namespaces and control groups or cgroups…
Recently I (along with a few others much smarter than me) had occasion to implement a ‘real’ production system with Istio, running on a managed cloud-provided Kubernetes service.
“My next ulcer will be called ‘istio'” by Ian Miell
Istio has a reputation for being difficult to build with and administer, but I haven’t read many war stories about trying to make it work, so I thought it might be useful to actually write about what it’s like in the trenches for a ‘typical’ team trying to implement this stuff. The intention is very much not to bury Istio, but to praise it (it does so much that is useful/needed for ‘real’ Kubernetes clusters – skip to the end if impatient) while warning those about to step into the breach what comes if you’re not prepared…