Behind the scenes, AWS Lambda

Writing code and deploying it to AWS Lambda is as easy as baking a cake (depending on the type of cake). Lambda performs the heavy lifting for you, from provisioning to scaling. But where is the magic happening and how does it actually work under the hood? Lets find out together!

Lambda is split into a control plane and data plane. Each plane is responsible for a specific set of activities in the service. The Control Plane provides management APIs and manages integrations with all AWS services. Whilst the Data Plane is Lambda’s Invoke API that triggers Lambda function invocations, this explanation is still very abstract but things will become clearer over time.


Why asynchronous Rust doesn’t work

“In 2017, I said that “asynchronous Rust programming is a disaster and a mess”. In 2021 a lot more of the Rust ecosystem has become asynchronous – such that it might be appropriate to just say that Rust programming is now a disaster and a mess. As someone who used to really love Rust, this makes me quite sad.

I’ve had a think about this, and I’m going to attempt to explain how we got here. Many people have explained the problems with asynchronous programming – the famous what colour is your function essay, for example.1 However, I think there are a number of things specific to the design of Rust that make asynchronous Rust particularly messy, on top of the problems inherent to doing any sort of asynchronous programming.

In particular, I actually think the design of Rust is almost fundamentally incompatible with a lot of asynchronous paradigms. It’s not that the people designing async were incompetent or bad at their jobs – they actually did a surprisingly good job given the circumstances!2 I just don’t think it was ever going to work out cleanly – and to see why, you’re going to have to read a somewhat long blog post!”

An ex-Googler’s guide to dev tools

Many years ago, I did a brief stint at Google. A lot has changed since then, but even that brief exposure to Google’s internal developer tools left a lasting impression on me. In many ways, the dev tools inside Google are the most advanced in the world. Google has been a pioneer not only in scaling their own software systems but in figuring out how to build software effectively at scale. They’ve dealt with issues related to codebase volume, code discoverability, organizational knowledge sharing, and multi-service deployment at a level of sophistication that most other companies have not yet reached. (For reference, see Software Engineering at Google.)

In other ways, however, Google’s internal tools are awfully limited. In particular, nearly all of them are tightly coupled with Google’s unique internal ecosystem. Unfortunately, that means you can’t take them with you when you leave.

The Google diaspora has seeded so many other organizations with amazing talented people who bring lessons learned from working inside one of the world’s leading technology organizations. But adapting to programming outside of Google can be tough, especially when you’ve come to rely on tools you no longer have at your disposal.

Over the years, I’ve learned from my own experience and the experience of lots of others who have left Google. Many of Sourcegraph’s early customers began with an ex-Googler missing code search after leaving Google. I worked closely with these people to understand the gap they were trying to fill, so that we could build Sourcegraph to meet their needs. Over time, patterns emerged in terms of how ex-Googlers sought to introduce new dev tools into their organizations, inspired by their experience with dev tools at Google. Some were successful and others were not.

I thought it would be helpful to write a guide to dev tools outside of Google for the ex-Googler, written with an eye toward pragmatism and practicality. No doubt many ex-Googlers wish they could simply clone the dev tools ecosystem inside of Google to their new company, but you can’t boil the ocean. Here is my take on where you should start and a general path I think ex-Googlers can take to find the tools that will make them—and their new teams—as productive as possible…

Dgraph on AWS: Setting up a horizontally scalable graph database

Dgraph is an open source, distributed graph database, built for production environments, and written entirely in Go. Dgraph is fast, transactional, sharded, and distributed (joins, filters, sorts), consistently replicated with Raft, and provides fault tolerance with synchronous replication and horizontal scalability.

The language used to interact with Dgraph is GraphQL and our variant called GraphQL+-. This gives apps access to the benefits of GraphQL directly from the database.

Dgraph has client integrations with official clients in Go, Java, Python, JavaScript, and C#; and community-supported clients with Dart, Rust, and Elixir. Dgraph users also can use any of the tools and libraries that work with GraphQL.

To get started right away, download Dgraph and follow the quick-start guide.

Getting started with Dgraph locally on your own computer, where you can quickly model your data in Dgraph and build your app, is easy. When you’re ready to deploy this to a production environment, you’ll want to deploy Dgraph to the cloud. You can horizontally scale Dgraph across multiple machines for high availability and data sharding.

In this article, we’ll show how to set up a resilient highly available Dgraph cluster on AWS.

Learning Math for Machine Learning

Vincent Chen is a student at Stanford University studying Computer Science. He is also a Research Assistant at the Stanford AI Lab.

It’s not entirely clear what level of mathematics is necessary to get started in machine learning, especially for those who didn’t study math or statistics in school.

In this piece, my goal is to suggest the mathematical background necessary to build products or conduct academic research in machine learning. These suggestions are derived from conversations with machine learning engineers, researchers, and educators, as well as my own experiences in both machine learning research and industry roles.

To frame the math prerequisites, I first propose different mindsets and strategies for approaching your math education outside of traditional classroom settings. Then, I outline the specific backgrounds necessary for different kinds of machine learning work, as these subjects range from high school-level statistics and calculus to the latest developments in probabilistic graphical models (PGMs). By the end of the post, my hope is that you’ll have a sense of the math education you’ll need to be effective in your machine learning work, whatever that may be!

To preface the piece, I acknowledge that learning styles/frameworks/resources are unique to a learner’s personal needs/goals— your opinions would be appreciated in the discussion on HN!

A Note on Math Anxiety
It turns out that a lot of people — including engineers — are scared of math. To begin, I want to address the myth of “being good at math.”

The truth is, people who are good at math have lots of practice doing math. As a result, they’re comfortable being stuck while doing math. A student’s mindset, as opposed to innate ability, is the primary predictor of one’s ability to learn math (as shown by recent studies).

To be clear, it will take time and effort to achieve this state of comfort, but it’s certainly not something you’re born with. The rest of this post will help you figure out what level of mathematical foundation you need and outline strategies for building it.

structslop: static analyzer for efficient struct packing for Go

TL;DR: at Orijtech, Inc., we’ve developed a first of its kind static analyzer, “structslop” that examines and recommends optimal struct field arrangements in your Go programs; it’ll help you reduce RAM consumed by offending structs, making your programs more efficient! High performance systems require efficiency in every aspect, and our work can help out!

Manual Memory Management in Go using jemalloc

Dgraph Labs has been a user of the Go language since our inception in 2015. Five years and 200K lines of Go code later, we’re happy to report that we are still convinced Go was and remains the right choice. Our excitement for Go has gone beyond building systems, and has led us to even write scripts in Go that would typically be written in Bash or Python. We find that using Go has helped us build a codebase that is clean, readable, maintainable and – most importantly – efficient and concurrent.

However, there’s one area of concern that we have had since the early days: memory management. We have nothing against the Go garbage collector, but while it offers a convenience to developers, it has the same issue that other memory garbage collectors do: it simply cannot compete with the efficiency of manual memory management.

When you manage memory manually, the memory usage is lower, predictable and allows bursts of memory allocation to not cause crazy spikes in memory usage. For Dgraph using Go memory, all of those have been a problem1. In fact, Dgraph running out of memory is a very common complaint we hear from our users.

Languages like Rust have been gaining ground partly because it allows safe manual memory management. We can completely empathize with that.

In our experience, doing manual memory allocation and chasing potential memory leaks takes less effort than trying to optimize memory usage in a language with garbage collection2. Manual memory management is well worth the trouble when building database systems that are capable of virtually unlimited scalability.

Our love of Go and our need to avoid Go GC led us to find novel ways to do manual memory management in Go. Of course, most Go users will never need to do manual memory management; and we would recommend against it unless you need it. And when you do need it, you’ll know.

In this post, I’ll share what we have learned at Dgraph Labs from our exploration of manual memory management, and explain how we manually manage memory in Go.

How we tracked down (what seemed like) a memory leak in one of our Go microservices

A blog special from the Detectify backend team:

The backend developer team at Detectify has been working with Go for some years now, and it’s the language chosen by us to power our microservices. We think Go is a fantastic language and it has proven to perform very well for our operations. It comes with a great tool-set, such as the tool we’ll touch on later on called pprof.

However, even though Go performs very well, we noticed one of our microservices had a behavior very similar to that of a memory leak.

In this post we will go step-by-step through our investigation of this problem, the thought process behind our decisions and the details needed to understand and fix the problem.

Python and Go

In a previous post we used gRPC to call Python code from Go. gRPC is a great framework, but there is a performance cost to it. Every function call needs to marshal the arguments using protobuf, make a network call over HTTP/2, and then un-marshal the result using protobuf.

In this blog post, we’ll get rid of the networking layer and to some extent, the marshalling. We’ll do this by using cgo to interact with Python as a shared library.

I’m not going to cover all of the code in detail in order to keep this blog size down. You can find all the code on github and I did my best to provide proper documentation. Feel free to reach out and ask me questions if you don’t understand something.

And finally, if you want to follow along, you’ll need to install the following (apart from Go):

  • Python 3.8
  • numpy
  • A C compiler (such as gcc)…