“Large-scale internet services aim to remain highly available and responsive in the presence of unexpected failures. Providing this service often requires monitoring and analyzing tens of millions of measurements per second across a large number of systems, and one particularly eective solution is to store and query such measurements in a time series database (TSDB). A key challenge in the design of TSDBs is how to strike the right balance between eciency, scalability, and reliability. In this paper we introduce Gorilla, Facebook’s in-memory TSDB. Our insight is that users of monitoring systems do not place much emphasis on individual data points but rather on aggregate analysis, and recent data points are of much higher value than older points to quickly detect and diagnose the root cause of an ongoing problem. Gorilla optimizes for remaining highly available for writes and reads, even in the face of failures, at the expense of possibly dropping small amounts of data on the write path. To improve query eciency, we aggressively leverage compression techniques such as delta-of-delta timestamps and XOR’d oating point values to reduce Gorilla’s storage footprint by 10x. This allows us to store Gorilla’s data in memory, reducing query latency by 73x and improving query throughput by 14x when compared to a traditional database (HBase)-backed time series data. This performance improvement has unlocked new monitoring and debugging tools, such as time series correlation search and more dense visualization tools. Gorilla also gracefully handles failures from a single-node toentire regions with little to no operational overhead…”
“Analyzing large graphs provides valuable insights for social networking and web companies in content ranking and recommendations. While numerous graph processing systems have been developed and evaluated on available benchmark graphs of up to 6.6B edges, they often face significant dif- ficulties in scaling to much larger graphs. Industry graphs can be two orders of magnitude larger – hundreds of billions or up to one trillion edges. In addition to scalability challenges, real world applications often require much more complex graph processing workflows than previously evaluated. In this paper, we describe the usability, performance, and scalability improvements we made to Apache Giraph, an open-source graph processing system, in order to use it on Facebook-scale graphs of up to one trillion edges. We also describe several key extensions to the original Pregel model that make it possible to develop a broader range of production graph applications and workflows as well as improve code reuse. Finally, we report on real-world operations as well as performance characteristics of several large-scale production applications…”
“On Facebook, people can keep up with their family and friends through reading status updates and viewing photos. In our backend, we store all the data that makes up the social graph of these connections. On mobile clients, we can’t download the entire graph, so we download a node and some of its connections as a local tree structure.
The image below illustrates how this works for a story with picture attachments. In this example, John creates the story, and then his friends like it and comment on it. On the left-hand side of the image is the social graph, to describe relations in the Facebook backend. When the Android app queries for the story, we get a tree structure starting with the story, including information about actor, feedback, and attachments (shown on the right-hand side of the image)…”
“If you’ve been watching our GitHub wiki, following us on Twitter, or reading the wikitech-l mailinglist, you’ve probably known for a while that Wikipedia has been transitioning to HHVM. This has been a long process involving lots of work from many different people, and as of a few weeks ago, all non-cached API and web traffic is being served by HHVM. This blog post from the Wikimedia Foundation contains some details about the switch, as does their page about HHVM.
I spent four weeks in July and August of 2014 working at the Wikimedia Foundation office in San Francisco to help them out with some final migration issues. While the primary goal was to assist in their switch to HHVM, it was also a great opportunity to experience HHVM as our open source contributors see it. I tried to do most of my work on WMF servers, using HHVM from GitHub rather than our internal repository. In addition to the work I did on HHVM itself, I also gave a talk about what the switch to HHVM means for Wikimedia developers…”
“In this video, Aravind Narayanan from Facebook first introduces Tupperware and explains how it leverages sandboxes and container technologies. He later talks about how Tupperware integrates with other Facebook services and highlights the lessons they learned while building it…”
“Since any client that wants to talk to memcached can already speak the standard ASCII memcached protocol, we use that as the common API and enter the picture silently. To a client, mcrouter looks like a memcached server. To a server, mcrouter looks like a normal memcached client. But mcrouter’s feature-rich configurability makes it more than a simple proxy…”
“Since Facebook’s first line of PHP, and its first MySQL INSERT statement, open source has been a huge part of our engineering philosophy.
Nowadays, we use, maintain, and contribute to a significant number of major projects – in areas as diverse as native mobile tools, big data systems, client-side web libraries, backend runtimes and infrastructure, and, through the Open Compute Project, server and storage hardware.
2013 has been a great year for our open source program, with a significant number of new projects that we’re really proud of, a renewed commitment to run and maintain them actively, and a desire to work with the vibrant communities that have built up around them. On our GitHub account alone, we now have more than 90 repos comprising over 40,000 commits and that have collectively been forked 15,000 times.
The end of the year is a great opportunity to look back at some of our major areas of investment, and recap a (non-exhaustive!) list of the projects we’ve been working on…”
“Facebook has one of the largest MySQL database clusters in the world. This cluster comprises many thousands of servers across multiple data centers on two continents.
Operating a cluster of this size with a small team is achieved by automating nearly everything a conventional MySQL Database Administrator (DBA) might do so that the cluster can almost run itself. One of the core components of this automation is a system we call MPS, short for “MySQL Pool Scanner.”
MPS is a sophisticated state machine written mostly in Python. It replaces a DBA for many routine tasks and enables us to perform maintenance operations in bulk with little or no human intervention…”