Writing code is hard. Writing code that has to deal with parallelism and concurrency is harder. Doing all of that an keeping it efficient is challenging.
Creating an OS Thread or switching from one to another can be costly for your programs in terms of memory and performance. Go aims to get advantages as much as possible from the cores. It has been designed with concurrency in mind from the beginning.
M, P, G orchestration
To solve this problem, Go has its own scheduler to distribute goroutines over the threads. This scheduler defines three main concepts, as explained in the code itself:
The main concepts are: G - goroutine. M - worker thread, or machine. P - processor, a resource that is required to execute Go code. M must have an associated P to execute Go code[...].
Here is a diagram of this
Each goroutine (
G) runs on an OS thread (
M) that is assigned to a logical CPU (
P). Let’s take a simple example to see how Go manages them…
If you haven’t heard, universities around the world are offering their courses online for free (or at least partially free). These courses are collectively called MOOCs or Massive Open Online Courses.
In the past six years or so, over 800 universities have created more than 10,000 of these MOOCs. And I’ve been keeping track of these MOOCs the entire time over at Class Central, ever since they rose to prominence.
In the past four months alone, 190 universities have announced 600 such free online courses. I’ve compiled a list of them and categorized them according to the following subjects: Computer Science, Mathematics, Programming, Data Science, Humanities, Social Sciences, Education & Teaching, Health & Medicine, Business, Personal Development, Engineering, Art & Design, and finally Science.
If you have trouble figuring out how to signup for Coursera courses for free, don’t worry — here’s an article on how to do that, too.
Many of these are completely self-paced, so you can start taking them at your convenience.
This post discusses how maps are implemented in Go. It is based on a presentation I gave at the GoCon Spring 2018 conference in Tokyo, Japan.
What is a map function?
To understand how a map works, let’s first talk about the idea of the map function. A map function maps one value to another. Given one value, called a key, it will return a second, the value.
map(key) → value
Now, a map isn’t going to be very useful unless we can put some data in the map. We’ll need a function that adds data to the map
insert(map, key, value)
and a function that removes data from the map
There are other interesting properties of map implementations like querying if a key is present in the map, but they’re outside the scope of what we’re going to discuss today. Instead we’re just going to focus on these properties of a map; insertion, deletion and mapping keys to values.
Corral is a MapReduce framework designed to be deployed to serverless platforms, like AWS Lambda. It presents a lightweight alternative to Hadoop MapReduce. Much of the design philosophy was inspired by Yelp’s mrjob — corral retains mrjob’s ease-of-use while gaining the type safety and speed of Go.
Corral’s runtime model consists of stateless, transient executors controlled by a central driver. Currently, the best environment for deployment is AWS Lambda, but corral is modular enough that support for other serverless platforms can be added as support for Go in cloud functions improves.
Corral is best suited for data-intensive but computationally inexpensive tasks, such as ETL jobs.
More details about corral’s internals can be found in this blog post.
Every concurrency API needs a way to run code concurrently. Here’s some examples of what that looks like using different APIs:
go myfunc(); // Golang pthread_create(&thread_id, NULL, &myfunc); /* C with POSIX threads */ spawn(modulename, myfuncname, ) % Erlang threading.Thread(target=myfunc).start() # Python with threads asyncio.create_task(myfunc()) # Python with asyncio
There are lots of variations in the notation and terminology, but the semantics are the same: these all arrange for myfunc to start running concurrently to the rest of the program, and then return immediately so that the parent can do other things.
Another option is to use callbacks:
Again, the notation varies, but these all accomplish the same thing: they arrange that from now on, if and when a certain event occurs, then myfunc will run. Then once they’ve set that up, they immediately return so the caller can do other things. (Sometimes callbacks get dressed up with fancy helpers like promise combinators, or Twisted-style protocols/transports, but the core idea is the same.)
And… that’s it. Take any real-world, general-purpose concurrency API, and you’ll probably find that it falls into one or the other of those buckets (or sometimes both, like asyncio).