Thoughts on Time-series Databases

“Preetam “recently” blogged about catena, a time-series metric store. There was another blog post about benchmarking boltdb by a Fog Creek engineer, also looking to write a time series database. This is something of a pattern in the Go community, which already boasts seriesly, InfluxDB, and prometheus; there are almost certainly others.

Time series data has been de rigueur at least since the Etsy’s seminal blog post on StatsD, though in reality that was just an inflection point. Time series modeling and graphing predates computer systems, but they have been a popular way of tracking and visualizing systems and networking data since at least the early 90s with MRTG. A few factors are converging now to make these kinds of systems more important: “Big Data” is getting much, much bigger; virtualization and containerization has increased the number of independent “nodes” for a typical distributed application; and the economies of the cloud have put the brakes on the types of performance increases typically attributed to “Moore’s Law.”

This topic is relevant to my work at Datadog, and I’ve been thinking about it for a long time now. I’ve wanted to collect my thoughts somewhere for a while, because some of them are identical to those expressed in other recent projects, and others are antithetical to them. I figured this would make my input at worst interesting.

For a primer on this subject, please read Baron’s Time-Series Database Requirements. There’s a reason that most other recent articles cite it; it contains a brief but complete description of the problem, the requirements for many large-scale time-series users, and some surprises for the uninitiated…”

Three Useful Python Libraries for Startups

Whitenoise handles Gzipping your content and setting far-future cache headers on content. With a trivial amount of work, you can configure your app to automatically append a hash to each of your static files every time you deploy changes, so that you can set the cache headers as so…”

Phonenumbers: Validating phone numbers is not easy, and there are tons of valid formats that make using regular expressions impossible. Moreover, even if a number passes a regular expression for formatting, it may still not be valid…”

Pdfkit makes it simple to generate PDFs from html. Why would one use this? Let’s say you have an invoice page in your app – you can use the same code to render that page to render a downloadable PDF for customers or your own records…”

“Python-dateutil: Numerous date utilities for calculating differences, etc. The most useful of these is a resilient date parser:…”

Running Lisp in Production

“At Grammarly, the foundation of our business, our core grammar engine, is written in Common Lisp. It currently processes more than a thousand sentences per second, is horizontally scalable, and has reliably served in production for almost 3 years.

We noticed that there are very few, if any, accounts of how to deploy Lisp software to modern cloud infrastructure, so we thought that it would be a good idea to share our experience. The Lisp runtime and programming environment provides several unique, albeit obscure, capabilities to support production systems (for the impatient, they are described in the final chapter)…”

Cache-friendly binary search

“High-speed memory caches present in modern computer architectures favor data structures with good locality of reference, i.e. the property by which elements accessed in sequence are located in memory addresses close to each other. This is the rationale behind classes such as Boost.Container flat associative containers, that emulate the functionality of standard C++ node-based associative containers while storing the elements contiguously (and in order). This is an example of how binary search works in a boost::container::flat_set with elements 0 trough 30…”

Introducing HypoPG, hypothetical indexes for PostgreSQL

“DALIBO is proud to present the first release of HypoPG, an extension that adds hypothetical indexes in PostgreSQL.

An hypothetical index is an index which doesn’t exists on disk. It’s thefore almost instant to create and doesn’t add any IO cost, wether at creation time or at maintenance time. The goal is obviously to check if an index is useful before spending many time, I/O and disk space to create it.

With this extension, you can create hypothetical indexes, and then with EXPLAIN check if PostgreSQL would use them or not…”

Telegram Bot Platform

“Telegram is about freedom and openness – our code is open for everyone, as is our API. Today we’re making another step towards openness by launching a Bot API and platform for third-party developers to create bots.

Bots are simply Telegram accounts operated by software – not people – and they’ll often have AI features. They can do anything – teach, play, search, broadcast, remind, connect, integrate with other services, or even pass commands to the Internet of Things…”