Goroutines, Nonblocking I/O, And Memory Usage

I am generally a fan of Go’s approach to concurrency: writing code with goroutines is a lot easier than writing traditional nonblocking network servers in a language like C or C++. However, while working on a highly concurrent network proxy I came across an interesting realization about how the Go concurrency model makes it harder to write programs that do a lot of concurrent I/O with efficient memory usage.

The program in question is a network proxy akin to HAProxy or Envoy. Typically the proxy has a very large number of clients connected, but most of those clients are actually idle with no outstanding network requests. Each client connection has a read buffer and a write buffer. Therefore the naive memory usage of such a program is at least: #connections * (readbuf_sz + writebuf_sz).

There’s a trick you can do in a C or C++ program of this nature to reduce memory usage. Suppose that typically 5% of the client connections are actually active, and the other 95% are idle with no pending reads or writes. In this situation you can create a pool of buffer objects. When connections are actually active they acquire buffers to use for reading/writing from the pool, and when the connections are idle they release the buffers back to the pool. This reduces the number of allocated buffers to approximately the number of buffers actually needed by active connections. In this case using this technique will give a 20x memory reduction, since only 5% as many buffers will be allocated compared to the naive approach.

The reason this technique works at all is due to how nonblocking reads and writes work in C. In C you use a system call like select(2) or epoll_wait(2) to get a notification that a file descriptor is ready to be read/written, and then after that you explicitly call read(2) or write(2) yourself on that file descriptor. This gives you the opportunity to acquire a buffer after the call to select/epoll, but before making the read call…