Comprehensive Guide to Serverless Go with AWS Lambda

First, let’s have a quick look as to how software was traditionally built.
Web applications are deployed on web servers running on physical machines. As a software developer, you needed to to be aware of the intricacies of the server that runs your software.
To get your application running on the server, you had to spend hours downloading, compiling, installing, configuring, and connecting all sorts of components. The OS of your machines need to be constantly upgraded and patched for security vulnerabilities. In addition, servers need to be provisioned, load-balanced, configured, patched, and maintained.
In short, managing servers is a time-consuming task which often requires dedicated and experienced systems operations personnel.
What server maintenance can feel like – Metropolis (1927 film)
What is the point of software engineering? Contrary to what some might think, the goal of software engineering isn’t to deliver software. A software engineer’s job is to deliver value – to get the usefulness of software into the hands of users.
At the end of the day, you do need servers to deliver software. However, the time spent managing servers is time you could have spent on developing new features and improving your application. When you have a great idea, the last thing you want to do is set up infrastructure. Instead of worrying about servers, you want to focus more on shipping value.
How can we minimize the time required to deliver impact?

A serverless MapReduce framework written for AWS Lambda

Corral is a MapReduce framework designed to be deployed to serverless platforms, like AWS Lambda. It presents a lightweight alternative to Hadoop MapReduce. Much of the design philosophy was inspired by Yelp’s mrjob — corral retains mrjob’s ease-of-use while gaining the type safety and speed of Go.

Corral’s runtime model consists of stateless, transient executors controlled by a central driver. Currently, the best environment for deployment is AWS Lambda, but corral is modular enough that support for other serverless platforms can be added as support for Go in cloud functions improves.

Corral is best suited for data-intensive but computationally inexpensive tasks, such as ETL jobs.

More details about corral’s internals can be found in this blog post.

Notes on structured concurrency, or: Go statement considered harmful

Every concurrency API needs a way to run code concurrently. Here’s some examples of what that looks like using different APIs:

go myfunc(); // Golang
pthread_create(&thread_id, NULL, &myfunc); /* C with POSIX threads */
spawn(modulename, myfuncname, []) % Erlang
threading.Thread(target=myfunc).start() # Python with threads
asyncio.create_task(myfunc()) # Python with asyncio

There are lots of variations in the notation and terminology, but the semantics are the same: these all arrange for myfunc to start running concurrently to the rest of the program, and then return immediately so that the parent can do other things.

Another option is to use callbacks:

QObject::connect(&emitter, SIGNAL(event()), &receiver, SLOT(myfunc())) // C++ with Qt
g_signal_connect(emitter, "event", myfunc, NULL) /* C with GObject */
document.getElementById("myid").onclick = myfunc; // Javascript
promise.then(myfunc, errorhandler) // Javascript with Promises
deferred.addCallback(myfunc) # Python with Twisted
future.add_done_callback(myfunc) # Python with asyncio

Again, the notation varies, but these all accomplish the same thing: they arrange that from now on, if and when a certain event occurs, then myfunc will run. Then once they’ve set that up, they immediately return so the caller can do other things. (Sometimes callbacks get dressed up with fancy helpers like promise combinators, or Twisted-style protocols/transports, but the core idea is the same.)

And… that’s it. Take any real-world, general-purpose concurrency API, and you’ll probably find that it falls into one or the other of those buckets (or sometimes both, like asyncio).

Distributed SQLite for Go applications

This repository provides the dqlite Go package, which can be used to replicate a SQLite database across a cluster, using the Raft algorithm.

Design higlights

  • No external processes needed: dqlite is just a Go library, you link it it to your application exactly like you would with SQLite.
  • Replication needs a SQLite patch which is not yet included upstream.
  • The Go Raft package from Hashicorp is used internally for replicating the write-ahead log frames of SQLite across all nodes.

How does it compare to rqlite?

The main differences from rqlite are:

  • Full support for transactions
  • No need for statements to be deterministic (e.g. you can use time())
  • Frame-based replication instead of statement-based replication, this means in dqlite there’s more data flowing between nodes, so expect lower performance. Should not really matter for most use cases.

What I learned in 2017 Writing Go

A little over a year ago, I joined Cloud Foundry to work on Loggregator, Cloud Foundry’s application logging component. Its core concern is best-effort log delivery without pushing back on upstream writers. Loggregator is written entirely in Go.

After spending more than a thousand hours working with Go in a non-trivial code base, I still admire the language and enjoy using it. Nonetheless, our team struggled with a number of problems, many of which seem unique to Go. What follows is a list of the most salient problems.

Learning Go by porting a medium-sized web backend from Python

Summary: To learn Go, I ported the backend of a small site I run from Python to Go, and had a fun, pain-free experience doing so.

I’ve been wanting to learn Go for a while now: I like the philosophy of a language that’s small, has a gentle learning curve, and compiles very fast (for a statically-typed language). What pushed me over the line to actually go and do it was seeing more and more fast, robust tools that are written in Go – Docker and ngrok are two I’ve used recently.

The philosophy of Go is not to everyone’s taste (no exceptions, no user-defined generics, etc), but it fit my mental model well. Simple, speedy, does things the obvious way. During the port, I was especially impressed with how robust the standard library and tooling was.