From SimpleDB to Cassandra: Data Migration for a High Volume Web Application at Netflix

“There will come a time in the life of most systems serving data, when there is a need to migrate data to a more reliable, scalable and high performance data store while maintaining or improving data consistency, latency and efficiency. This document explains the data migration technique we used at Netflix to migrate the user’s queue data between two different distributed NoSQL storage systems…”

How do you tell a non-technical person that they can’t understand?

“The doctor says you have a rare liver disease and you’ll need to treat it with a dangerous cocktail of drugs. How do you evaluate whether the doctor is right? For almost all of us, the answer is: You can’t…”

“In the end there’s nothing you can do, because this is a field that takes years to master, in both education and real-world experience, the complexities and context of which cannot be satisfactorily transmitted to even an intelligent layman. You’re going to have to trust your doctor…”

“That’s what it’s like talking to the architect of a software product containing a million lines of code. It’s not that the customer is “stupid,” nor that given enough time, training, and explanation, couldn’t eventually understand it all fully. But sometimes the customer just has to trust the vendor…”

REPL? A bit more (and less) than that

“The Erlang shell is a funny thing. I think a lot of people who used the language for a short while quickly got annoyed by the lack of support for features that are often considered very basic, such as history or history search (now supported since R16A), or lack of full support for Emacs shortcuts, or the fact that it doesn’t use readline, but only emulates it (wrapping Erlang’s shell in rlwrap is often recommended). Users of more advanced REPLs such as the one provided with Factor, or Dr. Racket, are likely disappointed with the visual support that’s available in Erlang. Not being able to declare inline modules is a bit annoying as modules are only accepted in files, not in the shell.

In this post, I want to explain how the Erlang shell works, why such features can be somewhat difficult or easy to add in, and also showcase some of the really neat features it has that few other shells provide…”

Up and running with Cassandra

Cassandra is a hybrid non-relational database in the same class as Google’s BigTable. It is more featureful than a key/value store like Riak, but supports fewer query types than a document store like MongoDB.

Cassandra was started by Facebook and later transferred to the open-source community. It is an ideal runtime database for web-scale domains like social networks.

This post is both a tutorial and a “getting started” overview. You will learn about Cassandra’s features, data model, API, and operational requirements—everything you need to know to deploy a Cassandra-backed service…”


Playing around with Lua/NginX, Python, MongoDB, Tornado and JQuery…

stock-labs is a “stock visualization tool” built using different backends and technologies (only for training and study purposes):
1. LuaOpenRestyHighstock with JQuery and MySQL (main focus on Lua/Nginx integration)
2. TornadoHighstock with JQuery and MongoDB (main focus on Tornado(async) and MongoDB async and sync drivers)

Reaching 200K events/sec

“Riemann’s TCP protocol is really simple. Send a Msg to the server, receive a response Msg. Messages might include some new events for the server, or a query; and a response might include a boolean acknowledgement or a list of events matching the query. The protocol is ordered; messages on a connection are processed in-order and responses sent in-order. Each Message is serialized using Protocol Buffers. To figure out how large each message is, you read a four-byte length header, then readlength bytes, and parse that as a Msg…”