Thoughts on Time-series Databases

“Preetam “recently” blogged about catena, a time-series metric store. There was another blog post about benchmarking boltdb by a Fog Creek engineer, also looking to write a time series database. This is something of a pattern in the Go community, which already boasts seriesly, InfluxDB, and prometheus; there are almost certainly others.

Time series data has been de rigueur at least since the Etsy’s seminal blog post on StatsD, though in reality that was just an inflection point. Time series modeling and graphing predates computer systems, but they have been a popular way of tracking and visualizing systems and networking data since at least the early 90s with MRTG. A few factors are converging now to make these kinds of systems more important: “Big Data” is getting much, much bigger; virtualization and containerization has increased the number of independent “nodes” for a typical distributed application; and the economies of the cloud have put the brakes on the types of performance increases typically attributed to “Moore’s Law.”

This topic is relevant to my work at Datadog, and I’ve been thinking about it for a long time now. I’ve wanted to collect my thoughts somewhere for a while, because some of them are identical to those expressed in other recent projects, and others are antithetical to them. I figured this would make my input at worst interesting.

For a primer on this subject, please read Baron’s Time-Series Database Requirements. There’s a reason that most other recent articles cite it; it contains a brief but complete description of the problem, the requirements for many large-scale time-series users, and some surprises for the uninitiated…”

http://jmoiron.net/blog/thoughts-on-timeseries-databases/

Three Useful Python Libraries for Startups

Whitenoise handles Gzipping your content and setting far-future cache headers on content. With a trivial amount of work, you can configure your app to automatically append a hash to each of your static files every time you deploy changes, so that you can set the cache headers as so…”

Phonenumbers: Validating phone numbers is not easy, and there are tons of valid formats that make using regular expressions impossible. Moreover, even if a number passes a regular expression for formatting, it may still not be valid…”

Pdfkit makes it simple to generate PDFs from html. Why would one use this? Let’s say you have an invoice page in your app – you can use the same code to render that page to render a downloadable PDF for customers or your own records…”

“Python-dateutil: Numerous date utilities for calculating differences, etc. The most useful of these is a resilient date parser:…”

http://blog.instavest.com/three-useful-python-libraries-for-startups

Running Lisp in Production

“At Grammarly, the foundation of our business, our core grammar engine, is written in Common Lisp. It currently processes more than a thousand sentences per second, is horizontally scalable, and has reliably served in production for almost 3 years.

We noticed that there are very few, if any, accounts of how to deploy Lisp software to modern cloud infrastructure, so we thought that it would be a good idea to share our experience. The Lisp runtime and programming environment provides several unique, albeit obscure, capabilities to support production systems (for the impatient, they are described in the final chapter)…”

http://tech.grammarly.com/blog/posts/Running-Lisp-in-Production.html

Cache-friendly binary search

“High-speed memory caches present in modern computer architectures favor data structures with good locality of reference, i.e. the property by which elements accessed in sequence are located in memory addresses close to each other. This is the rationale behind classes such as Boost.Container flat associative containers, that emulate the functionality of standard C++ node-based associative containers while storing the elements contiguously (and in order). This is an example of how binary search works in a boost::container::flat_set with elements 0 trough 30…”

http://bannalia.blogspot.com/2015/06/cache-friendly-binary-search.html

Introducing HypoPG, hypothetical indexes for PostgreSQL

“DALIBO is proud to present the first release of HypoPG, an extension that adds hypothetical indexes in PostgreSQL.

An hypothetical index is an index which doesn’t exists on disk. It’s thefore almost instant to create and doesn’t add any IO cost, wether at creation time or at maintenance time. The goal is obviously to check if an index is useful before spending many time, I/O and disk space to create it.

With this extension, you can create hypothetical indexes, and then with EXPLAIN check if PostgreSQL would use them or not…”

http://www.postgresql.org/about/news/1593/

Telegram Bot Platform

“Telegram is about freedom and openness – our code is open for everyone, as is our API. Today we’re making another step towards openness by launching a Bot API and platform for third-party developers to create bots.

Bots are simply Telegram accounts operated by software – not people – and they’ll often have AI features. They can do anything – teach, play, search, broadcast, remind, connect, integrate with other services, or even pass commands to the Internet of Things…”

Introducing the Fan – simpler container networking

“Canonical just announced a new, free, and very cool way to provide thousands of IP addresses to each of your VMs on AWS. Check out the fan networking on Ubuntu wiki page to get started, or read Dustin’s excellent fan walkthrough. Carry on here for a simple description of this happy little dose of awesome.

Containers are transforming the way people think about virtual machines (LXD) and apps (Docker). They give us much better performance and much better density for virtualisation in LXD, and with Docker, they enable new ways to move applications between dev, test and production. These two aspects of containers – the whole machine container and the process container, are perfectly complementary. You can launch Docker process containers inside LXD machine containers very easily. LXD feels like KVM only faster, Docker feels like the core unit of a PAAS.

The density numbers are pretty staggering. It’s *normal* to run hundreds of containers on a laptop.

And that is what creates one of the real frustrations of the container generation, which is a shortage of easily accessible IP addresses.

It seems weird that in this era of virtual everything that a number is hard to come by. The restrictions are real, however, because AWS restricts artificially the number of IP addresses you can bind to an interface on your VM. You have to buy a bigger VM to get more IP addresses, even if you don’t need extra compute. Also, IPv6 is nowehre to be seen on the clouds, so addresses are more scarce than they need to be in the first place.

So the key problem is that you want to find a way to get tens or hundreds of IP addresses allocated to each VM…”

Introducing the Fan – simpler container networking

https://wiki.ubuntu.com/FanNetworking