Introducing HypoPG, hypothetical indexes for PostgreSQL

“DALIBO is proud to present the first release of HypoPG, an extension that adds hypothetical indexes in PostgreSQL.

An hypothetical index is an index which doesn’t exists on disk. It’s thefore almost instant to create and doesn’t add any IO cost, wether at creation time or at maintenance time. The goal is obviously to check if an index is useful before spending many time, I/O and disk space to create it.

With this extension, you can create hypothetical indexes, and then with EXPLAIN check if PostgreSQL would use them or not…”

PostgreSQL vs. MS SQL Server

I work as a data analyst in a global professional services firm (one you have certainly heard of). I have been doing this for about a decade. I have spent that decade dealing with data, database software, database hardware, database users, database programmers and data analysis methods, so I know a fair bit about these things. I frequently come into contact with people who know very little about these things – although some of them don’t realise it.

Over the years I have discussed the issue of PostgreSQL vs. MS SQL Server many, many times. A well-known principle in IT says: if you’re going to do it more than once, automate it. This document is my way of automating that conversation.

Unless otherwise stated I am referring to PostgreSQL 9.3 and MS SQL Server 2014, even though my experience with MS SQL Server is with versions 2008 R2 and 2012 – for the sake of fairness and relevance I want to compare the latest version of PostgreSQL to the latest version of MS SQL Server. Where I have made claims about MS SQL Server I have done my best to check that they apply to version 2014 by consulting Microsoft’s own documentation – although, for reasons I will get to, I have also had to rely largely on Google, Stack Overflow and the users of the internet. I know it’s not scientifically rigorous to do a comparison like this when I don’t have equal experience with both databases, but this is not an academic exercise – it’s a real-world comparison. I have done my honest best to get my facts about MS SQL Server right – we all know it is impossible to bullshit the whole internet. If I find out that I’ve got something wrong, I’ll fix it.

I am comparing the two databases from the point of view of a data analyst. Maybe MS SQL Server kicks PostgreSQL’s arse as an OLTP backend (although I doubt it), but that’s not what I’m writing about here, because I’m not an OLTP developer/DBA/sysadmin…”

Looking at DBI

“Since its inception by Tim Bunce in 1994, DBI (DataBase Interface) has been the de facto standard for interfacing databases with the Perl language. Thanks to the designers’ foresight, DBI is database-independent, with methods, variables, and conventions that provide a consistent programming interface, no matter what database management system is being used…”

Under the hood: MySQL Pool Scanner (MPS)

“Facebook has one of the largest MySQL database clusters in the world. This cluster comprises many thousands of servers across multiple data centers on two continents.

Operating a cluster of this size with a small team is achieved by automating nearly everything a conventional MySQL Database Administrator (DBA) might do so that the cluster can almost run itself. One of the core components of this automation is a system we call MPS, short for “MySQL Pool Scanner.”

MPS is a sophisticated state machine written mostly in Python. It replaces a DBA for many routine tasks and enables us to perform maintenance operations in bulk with little or no human intervention…”

Simple API with Nginx and PostgreSQL

“Many of you have been concered about SQL injection in this example. I was thinking about to add sql injection protection in the example but I chose not to to keep it simple. But when I think about it was a bad move. So I have updated the example to include protection against SQL injection.

Sometimes it’s overkill to use a web framework if you only need to develop a very simple REST API. It turns out that Nginx can be used to develop a full fledged REST API and PostgreSQL can easily be used for persistence.

In this blog post I’m going to show you how to create a simple CRUD API for articles.”

Jepsen: Testing the Partition Tolerance of PostgreSQL, Redis, MongoDB and Riak

“Distributed systems are characterized by exchanging state over high-latency or unreliable links. The system must be robust to both node and network failure if it is to operate reliably–however, not all systems satisfy the safety invariants we’d like. In this article, we’ll explore some of the design considerations of distributed databases, and how they respond to network partitions.

IP networks may arbitrarily drop, delay, reorder, or duplicate messages send between nodes–so many distributed systems use TCP to prevent reordered and duplicate messages. However, TCP/IP is still fundamentally asynchronous: the network may arbitrarily delay messages, and connections may drop at any time. Moreover, failure detection is unreliable: it may be impossible to determine whether a node has died, the network connection has dropped, or things are just slower than expected…”

Announcing TokuDB v7: Open Source and More

“The free Community Edition is fully functional and fully performant. It has all the compression you’ve come to expect from TokuDB. It has hot schema changes: no-down-time column insertion, deletion, renaming, etc., as well as index creation. It has clustering secondary keys. We are also announcing an Enterprise Edition (coming soon) with additional benefits, such as a support package and advanced backup and recovery tools…”