How Streak built a graph database on Cloud Spanner to wrangle billions of emails

Streak makes a CRM add-on for Gmail, and recently adopted Cloud Spanner to take advantage of its scalability and SQL capabilities to implement a graph data model. Read on to learn about their decision, what they love about the system, and the ways in which it still needs work.


List of Recommender Systems

Recommender systems (or recommendation engines) are useful and interesting pieces of software. I wanted to compare recommender systems to each other but could not find a decent list, so here is the one I created. Please help me keep this post up-to-date by submitting corrections and additions via pull-request, or tweet me @grahamjenson.

AWS Lambda Programming Language Comparison


Now that AWS Lambda has added PowerShell to its growing list of supported languages, let’s take a moment to compare and contrast the different languages available to us.

In this post, we’ll take a look at these languages from a number of angles:

  • Cold start performance: performance during a cold start
  • Warm performance: performance after the initial cold start
  • Cost: does it cost you more to run functions in one language over another? If so, why?
  • Ecosystem: libraries, deployment tooling, etc.
  • Platform support: is the language supported by other function-as-a-service (FAAS) platforms?

We will also talk about specialized use cases such as Machine Learning (ML) as well as paying attention to the special needs of the enterprise. Finally, we’ll round off the discussion by looking at a few languages that are not officially supported but that you can use with Lambda via shims.

I should stress that the goal of this post is to consider the relative strengths and weaknesses of each language within the specific context of AWS Lambda. This is not a general purpose language comparison!

190 universities just launched 600 free online courses. Here’s the full list

If you haven’t heard, universities around the world are offering their courses online for free (or at least partially free). These courses are collectively called MOOCs or Massive Open Online Courses.

In the past six years or so, over 800 universities have created more than 10,000 of these MOOCs. And I’ve been keeping track of these MOOCs the entire time over at Class Central, ever since they rose to prominence.

In the past four months alone, 190 universities have announced 600 such free online courses. I’ve compiled a list of them and categorized them according to the following subjects: Computer Science, Mathematics, Programming, Data Science, Humanities, Social Sciences, Education & Teaching, Health & Medicine, Business, Personal Development, Engineering, Art & Design, and finally Science.

If you have trouble figuring out how to signup for Coursera courses for free, don’t worry — here’s an article on how to do that, too.

Many of these are completely self-paced, so you can start taking them at your convenience.

Practical Text Classification With Python and Keras

Imagine you could know the mood of the people on the Internet. Maybe you are not interested in its entirety, but only if people are today happy on your favorite social media platform. After this tutorial, you’ll be equipped to do this. While doing this, you will get a grasp of current advancements of (deep) neural networks and how they can be applied to text.

Reading the mood from text with machine learning is called sentiment analysis, and it is one of the prominent use cases in text classification. This falls into the very active research field of natural language processing (NLP). Other common use cases of text classification include detection of spam, auto tagging of customer queries, and categorization of text into defined topics. So how can you do this?

Replacing the cache replacement algorithm in memcached

In this post we delve into a reworking of memcached’s Least Recently Used (LRU) algorithm which was made default when 1.5.0 was released. Most of these features have been available via the “-o modern” switch for years. The 1.5.x series has enabled them all to work in concert to reduce RAM requirements.

When memcached was first deployed, it was typically co-located on backend web servers, using spare RAM and CPU cycles. It was important that it stay light on CPU usage while being fast; otherwise it would affect the performance of the application it was attempting to improve.

Over time, the deployment style has changed. There are frequently fewer dedicated nodes with more RAM, but spare CPU. On top of this web requests can fetch dozens to hundreds of objects at once, with the request latency having a greater overall impact.

This post is focused on the efforts to reduce the number of expired items wasting cache space, general LRU improvements, as well as latency consistency.

Postcards from Lambda @ the Edge

I first saw Lambda@Edge at re:Invent a couple of years back and wasn’t sure what to make of it. The demo showed how you could manipulate http headers in-flight but the audience was in a post-lunch sugar-crash full of ‘so what?’. It wasn’t the presenter’s fault —some next-level concepts just weren’t landing their punches with the crowd. I mean, who needs to interfere with the CDN, right?

Recently we started a CloudFront-heavy project that performs all sorts of optimization voodoo for webpages. That’s when I remembered back to the Lambda@Edge presentation and a few lightbulbs started to flicker. To test this, we quickly moved some code into functions at The Edge and saw some blistering performance gains. While there was nothing particularly amazing about the code we wrote, getting it working has been a trial like never before. Let’s cover the gotchas before the gold.