A founder’s perspective on 4 years with Haskell

In 2012 I co-founded Better1, a startup building a new type of enterprise e-learning platform. Our goal was to make it faster and cheaper for large companies to develop, deliver, and analyse adaptive, cross-platform, multi-language, online courses.

From day one, we decided to use Haskell as our main language, and it remained the only language we used on the back-end as our team grew to 10 developers.

After a period of experimentation and development, Better grew from $0 to $500k+ in annual recurring revenue in the space of a few months, with companies including American Express and Swissport as customers. However, the distribution model proved challenging to grow beyond that and we eventually sold to GRC Solutions, an Australian compliance company.

Though interest in Haskell appears to be growing steadily, its use in production is still rare. Some have the mistaken impression it’s an academic language and nothing more. In this post I’ll give my perspective on what it’s like to use Haskell in a startup. Can you get stuff done? Does it hold up in practice? Can you hire developers? Should more companies use it?

The short answer to those questions is yes. It’s not a fit for all problems or for all teams, but it is worth serious consideration. For building server-side software, Haskell might be the closest thing to a secret weapon you’ll find today.2

http://baatz.io/posts/haskell-in-a-startup/

Distributed Systems in Haskell

TL;DR:

  1. Only ever block if there are no messages whatsoever waiting for your server.
  2. Don’t use interrupt-based timeouts.
  3. Separate your server logic and any networking.
  4. Try to have pure server logic.
  5. Use Monads to simplify your code as it gets bigger.
  6. Use Cloud Haskell and Lenses and other nice libraries to simplify your life and your code.

Haskell-nonspecific Advice

1: Do Not Block

Every time we used multiple blocking reads from the network, it came back to haunt us. For example, we would send a Ping and wait for a Pong before continuing. This leads to all sorts of bad behavior. What happens if both serversPing each other at the same time? Deadlock. Even blocking reads that seemed innocuous at first usually led to confusion and race conditions later on.

Instead, use an asynchronous architecture. Your program should block in exactly one place. Each node in the system should have a “superloop” that performs blocking reads from the network using an efficient epoll-like mechanism and then dispatches messages appropriately.

Why?

It may seem like this architecture introduces uneccesary logical complexity compared to a bit of blocking sprinkled throughout the code, but in every instance we came across, blocking in exactly one place turned out to be much easier in practice. We eliminated all race conditions we came across and maximized performance by eliminating any unneccesary delays. A server that only blocks in one place is guaranteed to process any waiting message the moment it has free CPU cycles.

2: Use Asynchronous Timing

Some algorithms (especially probabilistic ones) rely on things like timeouts.

In our experience, implementing timeout and other time-based behavior as a blocking read with a timeout is a recipe for confusion.

Instead, we found that the best approach was to spawn, for each node, a separate “tick generator” thread. This “tick generator” simply sends its parent thread an empty Tick message at a given frequency.

Why?

There are several advantages to this approach.

  1. All timeout logic is handled with standard run-of-the-mill server logic. There are no interrupts, timers, or exceptions to deal with. You just process Tickmessages like any other message.
  2. You need to keep track of separate timeouts for separate servers. This approach vastly simplifies doing so. You just keep a map from servers to the number of Ticks since you last heard from them. If this number gets too high, it’s a timeout. Compare this to any solution that involves reads with timeouts or checking system time. In our experience, this was much simpler.
  3. This architecture unifies two different timeout scenarios. One scenario is a complete loss of incoming messages, which normally has to be dealt with by using a blocking read with a timeout. The other scenario is when the server still receives messages on a regular basis, but doesn’t receive messages from a particular server for a long time. Blocking reads with timeouts won’t cover the latter failure case. Ticks cleanly handle both cases.

3: Separate Networking and Logic

This is a somewhat specific architectural recommendation; no doubt there are algorithms where this advice does not apply. However, in all our use cases, this approach worked very well (and we tried quite a few approaches).

Distributed systems papers are often as poorly written as they are clever. The included code rarely works properly, if at all. One bad habit that these papers tend to have is the thorough mixing of network operations and algorithm logic. It’s pretty common to see things along the lines of

send(server,msg);
x = receive_msg();
y = process(x);
send(server,y);

but with a lot more junk thrown in.

It turns out that this is not conducive to clean, understandable code. There’s a lot of implicit state being offloaded to the network when you structure things like this, and it makes it a lot harder to recover from things like network interruptions or servers going offline. You end up using timeouts and all sorts of ugly constructs to make things work in practice.

Instead, you should completely separate your server logic and your network functionality. Again, this might sound like a lot of work, but it’s almost guaranteed to save you more time in the long run.

In Haskell terms, your server logic will (ideally) have a type like this:

serverStep :: Config -> State -> Message -> (State, [Message])

In prose, the server logic takes three arguments:

  • The server’s configuration, which does not change (Hostname, directory, etc.)
  • The server’s previous state
  • A message received from the network

The server logic then returns

  • The new server state
  • A list of messages to send

Then you just have to write a simple wrapper around this function that receives messages from the network, feeds them into the function, and sends the responses out to the network.

With a bit of work, any program requiring a sequence of sends and receives can be transformed into this form (a single receive followed by arbitrarily many sends), so even an the most stubbornly ugly distributed paper can be adapted to this form.

Why?

  1. This form guarantees that you meet suggestion #1 and get all the advantages of doing so. In particular, your server will never block unless there is nothing in the incoming message queue. Therefore, your server will process any incoming messages the instant it has free CPU cycles.
  2. Network code is simpler. There’s just one place you send and receive messages, and it’s very straightforward to implement.
  3. Testing is much easier. When your server logic is a pure function as described above, server behavior is entirely deterministic and much more amenable to testing. It’s easy to build a test harness that “simulates” the network. All you have to do is keep a list of all your servers’ states and a queue of messages waiting to be delivered. A test harness looks like this:
while queue is not empty:
   pop msg off queue
   (new_state, new_msgs) = serverStep configs[msg.dest] states[msg.dest] msg
   states[msg.dest] = new_state
   put new_msgs into queue

If you want to test things like out-of-order message delivery, you just mix up your queue instead of putting messages in in order. You have complete control!

When we implemented Paxos (with some pedagogical shortcuts), this approach saved us a lot of trouble. We would run the simulation and check the Paxos invariants at each step. We found a lot of bugs this way! This would be very difficult on a real network.

http://yager.io/Distributed/Distributed.html

The Joy and Agony of Haskell in Production

“There have been several good talks about using Haskell in industry lately, and several people asked me to write about my personal experiences. Although I can’t give specific details I will speak broadly about some things I’ve learned and experienced.

The myths are true. Haskell code tends to be much more reliable, performant, easy to refactor, and easier to incorporate with coworkers code without too much thinking. It’s also just enjoyable to write.

The myths are sometimes trueisms. Haskell code tends to be of high quality by construction, but for several reasons that are only correlated; not causally linked to the technical merits of Haskell. Just by virtue of language being esoteric and having a relatively higher barrier to entry we’ll end up working with developers who would write above average code in any language. That said, the language actively encourage thoughtful consideration of abstractions and a “brutal” (as John Carmack noted) level of discipline that high quality code in other languages would require, but are enforced in Haskell.

Prefer to import libraries as qualified. Typically this is just considered good practice for business logic libraries, it makes it easier to locate the source of symbol definitions. The only point of ambiguity I’ve seen is disagreement amongst developers on which core libraries are common enough to import unqualified and how to handle symbols. This ranges the full spectrum from fully qualifying everything (Control.Monad.>>=) to common things like (Data.Maybe.maybe) or just disambiguating names like (Map.lookup).

Consider rolling an internal prelude. As we’ve all learned the hard way, the Prelude is not your friend. The consensus historically has favored the “Small Prelude Assumption” which presupposes that tools get pushed out into third party modules, even the core tools that are necessary to do anything (text, bytestring, vector, etc). This makes life easier for library authors at the cost of some struggle for downstream users…”

http://www.stephendiehl.com/posts/production.html

What I Wish I Knew When I Learning Haskell

Sections that have had been added or seen large changes:

  • Irrefutable Patterns
  • Hackage
  • Exhaustiveness
  • Stacktraces
  • Laziness
  • Skolem Capture
  • Foreign Function Pointers
  • Attoparsec Parser
  • Inline Cmm
  • PrimMonad
  • Specialization
  • unbound-generics
  • Editor Integration
  • EKG
  • Nix
  • Haddock
  • Monad Tutorials Commentary
  • Monad Morphisms
  • Corecursion
  • Category
  • Arrows
  • Bifunctors
  • ExceptT
  • hint / mueval
  • Roles
  • Higher Kinds
  • Kind Polymorphism
  • Numeric Tower
  • SAT Solvers
  • Graph
  • Sparks
  • Threadscope
  • Generic Parsers
  • GHC Block Diagram
  • GHC Debug Flags
  • Core
  • Inliner
  • Unboxed Types
  • Runtime Memory Representation
  • ghc-heapview
  • STG
  • Worker/Wrapper
  • Z-Encoding
  • Cmm
  • Runtime Optimizations
  • RTS Profiling
  • Algebraic Relations

http://dev.stephendiehl.com/hask/

Functional Programming in the Real World

“Here is a list of functional programs applied to real-world tasks. The main criterion for being real-world is that the program was written primarily to perform some task, not primarily to experiment with functional programming. Functional is used in the broad sense that includes both `pure’ programs (no side effects) and `impure’ (some use of side effects). Languages covered include CAML, Clean, Erlang, Haskell, Miranda, Scheme, SML, and others…”

http://homepages.inf.ed.ac.uk/wadler/realworld/

Sodium – Functional Reactive Programming (FRP) Library for Java, Haskell, C++, C# and Scala

“Sodium – Functional Reactive Programming in C#, C++, Java, Haskell and Scala (other languages to be added) This is based on Flapjax, Yampa, scala.React and a number of other Functional Reactive Programming efforts, as well as a lot of personal experience. Enjoy. Status: Haskell – complete Java – complete C++ – complete C# – complete Embedded-C – just an experiment Rust – got nowhere with it yet Scala – complete…”

Slides: http://blog.reactiveprogramming.org/wp-uploads/2013/12/FRP4.pdf

https://github.com/SodiumFRP/sodium

Mio: A High-Performance Multicore IO Manager for GHC

“Haskell threads provide a key, lightweight concurrency abstraction to simplify the programming of important network applications such as web servers and software-defined network (SDN) controllers. The flagship Glasgow Haskell Compiler (GHC) introduces a run-time system (RTS) to achieve a high-performance multicore implementation of Haskell threads, by introducing effective components such as a multicore scheduler, a parallel garbage collector, an IO manager, and efficient multicore memory allocation. Evaluations of the GHC RTS, however, show that it does not scale well on multicore processors, leading to poor performance of many network applications that try to use lightweight Haskell threads. In this paper, we show that the GHC IO manager, which is a crucial component of the GHC RTS, is the scaling bottleneck. Through a series of experiments, we identify key data structure, scheduling, and dispatching bottlenecks of the GHC IO manager. We then design a new multicore IO manager named Mio that eliminates all these bottlenecks. Our evaluations show that the new Mio manager improves realistic web server throughput by 6.5x and reduces expected web server response time by 5.7x. We also show that with Mio, McNettle (an SDN controller written in Haskell) can scale effectively to 40+ cores, reach a throughput of over 20 million new requests per second on a single machine, and hence become the fastest of all existing SDN controllers…”
hask035-voellmy