Hydra – Run your own enteprise-grade IAM service with OAuth2 capabilities in less than 2 minutes

At first, there was the monolith. The monolith worked well with the bespoke authentication module. Then, the web evolved into an elastic cloud that serves thousands of different user agents in every part of the world.

Hydra is driven by the need for a scalable, low-latency, in memory Access Control, OAuth2, and OpenID Connect layerthat integrates with every identity provider you can imagine.

Hydra is available through Docker and relies on RethinkDB for persistence. Database drivers are extensible, in case you want to use RabbitMQ, MySQL, MongoDB, or some other database instead.


Cuckoo filters and their analysis

Michael Mitzenmacher has described cuckoo filters in an earlier blog post (as well as of course in the published paper about them) but the basic idea is to use a cuckoo hash table cut down in size by storing only a short fingerprint of each key rather than a whole key-value pair. As in a normal cuckoo hash table, keys (or rather their fingerprints) get moved around to make room for other keys, and that leads to a small complication: when you’re moving a fingerprint, you don’t know which key it came from, so the location to move it to needs to be computable based only on where it is now and on its value. More specifically, the other location for any fingerprint ends up being the xor of its current location with a hash of its value.

Although cuckoo filters have been implemented (see first link) and work well in practice, one drawback is that we didn’t know whether they also work well in theory. Conversely, an earlier data structure of Pagh, Pagh, and Rao (SODA 2005) has all the same advantages of cuckoo filters over Bloom filters, but for it, as far as I am aware, there was no implementation, only theory. In contrast, Bloom filters work both actually and theoretically: there is no major gap between theory and practice.


Go implementation of a GSLB

This is a homebrew GSLB (Global Server Load Balancer). It uses DNS to route customers to locations based on a combination of their ISP, and the health check status of your sites.

DNS lookups are one way of routing customers to an available resource. Decisions can be made that focus on proximity to the user (in our case, “isp” match);

This is specifically used in the falling-sky project; aka test-ipv6.com.


PyThalesians – Open Source Financial Library

PyThalesians is a Python financial library developed by the Thalesians (http://www.thalesians.com). I have used the library to develop my own trading strategies and I’ve included simple samples which show some of the functionality including an FX trend following model and other bits of financial analysis.

There are many open source Python libraries for making trading strategies around! However, I’ve developed this one to be as flexible as possible in terms of what types of strategies you can develop with it. In addition, a lot of the library can be used to analyse and plot financial data for broader based analysis, of the type that I’ve had to face being in markets over the years. Hence, it can be used by a wider array of users.

At present the PyThalesians offers:

  • Backtesting of systematic trading strategies for cash markets (including cross sectional style trading strategies)
  • Sensitivity analysis for systematic trading strategies parameters
  • Seamless historic data downloading from Bloomberg (requires licence), Yahoo, Quandl, Dukascopy and other market data sources
  • Produces beautiful line plots with PyThalesians wrapper (via Matplotlib), Plotly (via cufflinks) and a simple wrapper for Bokeh
  • Analyse seasonality analysis of markets
  • Calculates some technical indicators and gives trading signals based on these
  • Helper functions built on top of Pandas
  • Automatic tweeting of charts
  • And much more!
  • Please bear in mind at present PyThalesians is currently a highly experimental alpha project and isn’t yet fully documented
  • Uses Apache 2.0 licence


JVM JIT optimization techniques

There’s a lot of buzz about JVM optimizations and how it makes production code perform better thanks to the Just-In-Time (JIT) compilation and various optimization techniques. A lots of excellent research materials are available, but I wanted to see for myself how these apply in practice, so I decided to dig deeper and play around with some measurements.

There can be differences between different JVM implementations and architectures that the measurements are done.

Different JVM implementations and architectures might yield different results, so in this post I don’t intend to give exact measurements, just a bird’s-eye view on the possibilities of the platform.


Applicative Protocol Multiplexer (e.g. share SSH and HTTPS on the same port)

sslh accepts connections on specified ports, and forwards them further based on tests performed on the first data packet sent by the remote client.

Probes for HTTP, SSL, SSH, OpenVPN, tinc, XMPP are implemented, and any other protocol that can be tested using a regular expression, can be recognised. A typical use case is to allow serving several services on port 443 (e.g. to connect to ssh from inside a corporate firewall, which almost never block port 443) while still serving HTTPS on that port.

Hence sslh acts as a protocol demultiplexer, or a switchboard. Its name comes from its original function to serve SSH and HTTPS on the same port.

sslh supports IPv6, privilege dropping, transparent proxying, and more.


Titan – Distributed Graph Database

Titan is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. Titan is a transactional database that can support thousands of concurrent users executing complex graph traversalsin real time.

In addition, Titan provides the following features:

Download Titan or clone from GitHub. Read the Titan documentation and join the mailing list.


Python in production engineering

Python aficionados are often surprised to learn that Python has long been the language most commonly used by production engineers at Facebook and is the third most popular language at Facebook, behind Hack (our in-house dialect of PHP) and C++. Our engineers build and maintain thousands of Python libraries and binaries deployed across our entire infrastructure.

Every day, dozens of Facebook engineers commit code to Python utilities and services with a wide variety of purposes including binary distribution, hardware imaging, operational automation, and infrastructure management.