This is the followup to my talk LLVM Optimized Python at the Harvard-Smithsonian Center for Astrophysics, we’ll do the deep dive that I didn’t have time for. We’re going to build a single module Numba-like compiler for Python. It won’t be nearly as featureful or complete, but should demonstrate how you can go about building your own little LLVM specializer for a subset of Python or your own custom DSL expression compiler; and integrating it with the standard NumPy/SciPy stack for whatever scientific computing domain you work. The full source for this project is available on Github and comes in at 1000 lines for the whole specializer, very tiny!
There’s a whole slew of interesting domains where this kind of on-the-fly specializing compiler can be used:
- Computation kernels for MapReduce
- Financial backtesting
- Dense linear algebra
- Image processing
- Data pipeline hotspots
- Speeding up SQL queries in PostgreSQL
- Molecular dynamics
- Compiling UDFs for Cloudera Impala
Python is great for rapid development and high-level thinking, but is slow due to too many level of indirection, hashmap lookups, broken parallelism,slow garbage collector, and boxed PyObject types. With LLVM we can keep writing high-level code and not sacrafice performance.
You will need python
, llvm
, llvmpy
, numpy
and a bit of time. The best way to get all of these is to install Anaconda maintained by my good friend Ilan. Don’t add any more entropy to the universe by compiling NumPy from source, just use Anaconda.