Dgraph on AWS: Setting up a horizontally scalable graph database

Dgraph is an open source, distributed graph database, built for production environments, and written entirely in Go. Dgraph is fast, transactional, sharded, and distributed (joins, filters, sorts), consistently replicated with Raft, and provides fault tolerance with synchronous replication and horizontal scalability.

The language used to interact with Dgraph is GraphQL and our variant called GraphQL+-. This gives apps access to the benefits of GraphQL directly from the database.

Dgraph has client integrations with official clients in Go, Java, Python, JavaScript, and C#; and community-supported clients with Dart, Rust, and Elixir. Dgraph users also can use any of the tools and libraries that work with GraphQL.

To get started right away, download Dgraph and follow the quick-start guide.

Getting started with Dgraph locally on your own computer, where you can quickly model your data in Dgraph and build your app, is easy. When you’re ready to deploy this to a production environment, you’ll want to deploy Dgraph to the cloud. You can horizontally scale Dgraph across multiple machines for high availability and data sharding.

In this article, we’ll show how to set up a resilient highly available Dgraph cluster on AWS.

https://aws.amazon.com/blogs/opensource/dgraph-on-aws-setting-up-a-horizontally-scalable-graph-database/

Manual Memory Management in Go using jemalloc

Dgraph Labs has been a user of the Go language since our inception in 2015. Five years and 200K lines of Go code later, we’re happy to report that we are still convinced Go was and remains the right choice. Our excitement for Go has gone beyond building systems, and has led us to even write scripts in Go that would typically be written in Bash or Python. We find that using Go has helped us build a codebase that is clean, readable, maintainable and – most importantly – efficient and concurrent.

However, there’s one area of concern that we have had since the early days: memory management. We have nothing against the Go garbage collector, but while it offers a convenience to developers, it has the same issue that other memory garbage collectors do: it simply cannot compete with the efficiency of manual memory management.

When you manage memory manually, the memory usage is lower, predictable and allows bursts of memory allocation to not cause crazy spikes in memory usage. For Dgraph using Go memory, all of those have been a problem1. In fact, Dgraph running out of memory is a very common complaint we hear from our users.

Languages like Rust have been gaining ground partly because it allows safe manual memory management. We can completely empathize with that.

In our experience, doing manual memory allocation and chasing potential memory leaks takes less effort than trying to optimize memory usage in a language with garbage collection2. Manual memory management is well worth the trouble when building database systems that are capable of virtually unlimited scalability.

Our love of Go and our need to avoid Go GC led us to find novel ways to do manual memory management in Go. Of course, most Go users will never need to do manual memory management; and we would recommend against it unless you need it. And when you do need it, you’ll know.

In this post, I’ll share what we have learned at Dgraph Labs from our exploration of manual memory management, and explain how we manually manage memory in Go.

https://dgraph.io/blog/post/manual-memory-management-golang-jemalloc/

How we tracked down (what seemed like) a memory leak in one of our Go microservices

A blog special from the Detectify backend team:

The backend developer team at Detectify has been working with Go for some years now, and it’s the language chosen by us to power our microservices. We think Go is a fantastic language and it has proven to perform very well for our operations. It comes with a great tool-set, such as the tool we’ll touch on later on called pprof.

However, even though Go performs very well, we noticed one of our microservices had a behavior very similar to that of a memory leak.

In this post we will go step-by-step through our investigation of this problem, the thought process behind our decisions and the details needed to understand and fix the problem.

Python and Go

In a previous post we used gRPC to call Python code from Go. gRPC is a great framework, but there is a performance cost to it. Every function call needs to marshal the arguments using protobuf, make a network call over HTTP/2, and then un-marshal the result using protobuf.

In this blog post, we’ll get rid of the networking layer and to some extent, the marshalling. We’ll do this by using cgo to interact with Python as a shared library.

I’m not going to cover all of the code in detail in order to keep this blog size down. You can find all the code on github and I did my best to provide proper documentation. Feel free to reach out and ask me questions if you don’t understand something.

And finally, if you want to follow along, you’ll need to install the following (apart from Go):

  • Python 3.8
  • numpy
  • A C compiler (such as gcc)…

https://www.ardanlabs.com/blog/2020/09/using-python-memory.html

How CockroachDB Wrote a Massive & Complex Go Application

Garbage Collection in Go

In this talk Ben Darnell, the CTO and Co-Founder of Cockroach Labs, discusses the decision to utilize Go in CockroachDB. Ben shares how CockroachDB optimized its memory usage to mitigate issues related to garbage collection and improved its use of channels to avoid deadlocks. Ben also shares creative techniques to integrate non-Go dependencies into the Go build process.

Garbage collection in Go can cause an application to pause which is a concerning issue, but Go also makes a lot of manual tweaks available that allow contol of what actually ends up on top of the garbage heap. Here are two of the optimizations made by CockroachDB to mitigate garbage collection issues:

  • Combining Allocations
  • sync.Pool

By vitrue of these two practices (which you can see examples of in the video) CockroachDB sees in Go’s benchmarking tools that no new allocations are done per iteration. Everything is allocated up front and cached. 

https://www.cockroachlabs.com/community/tech-talks/challenges-writing-massive-complex-go-application/

Lessons learned scaling PostgreSQL database to 1.2bn records/month

“This isn’t my first rodeo with large datasets. The authentication and product management database that I have designed for the largest UK public Wi-Fi provider had impressive volumes too. We were tracking authentication for millions of devices daily. However, that project had a funding that allowed us to pick any hardware, any supporting services and hire any DBAs to assist with replication/data warehousing/troubleshooting. Furthermore, all analytics queries/reporting were done off logical replicas and there were multiple sysadmins that looked after the supporting infrastructure. Whereas this was a venture of my own, with limited funding and 20x the volume.

Others’ mistakes

This is not to say that if we did have loadsamoney we would have spent it on purchasing top-of-the-line hardware, flashy monitoring systems or DBAs (Okay, maybe having a dedicated DBA would have been nice). Over many years of consulting I have developed a view that the root of all evil lies in the unnecessarily complex data processing pipeline. You don’t need a message queue for ETL and you don’t need an application-layer cache for database queries. More often than not, these are workarounds for the underlying database issues (e.g. latency, poor indexing strategy) that create more issues down the line. In ideal scenario, you want to have all data contained within a single database and all data loading operations abstracted into atomic transactions. My goal was not to repeat these mistakes.

Our goals

As you have already guessed, our PostgreSQL database became the central piece of the business (aptly called ‘mother’, although my co-founder insists that me calling various infrastructure components ‘mother’, ‘mothership’, ‘motherland’, etc is worrying). We don’t have a standalone message queue service, cache service or replicas for data warehousing. Instead of maintaining the supporting infrastructure, I have dedicated my efforts to eliminating any bottlenecks by minimizing latency, provisioning the most suitable hardware, and carefully planning the database schema. What we have is an easy to scale infrastructure with a single database and many data processing agents. I love the simplicity of it — if something breaks, we can pin point and fix the issue within minutes. However, a lot of mistakes were done along the way — this articles summarizes some of them…”

https://medium.com/@gajus/lessons-learned-scaling-postgresql-database-to-1-2bn-records-month-edc5449b3067