Lua Metatables Tutorial

“In this tutorial I’ll be covering a very important concept in Lua: metatables. Knowledge of how to use metatables will allow you to be much more powerful in your use of Lua. Every table can have a metatable attached to it. A metatable is a table which, with some certain keys set, can change the behaviour of the table it’s attached to…”


Statistical Data Mining Tutorials

“The following links point to a set of tutorials on many aspects of statistical data mining, including the foundations of probability, the foundations of statistical data analysis, and most of the classic machine learning and data mining algorithms.

These include classification algorithms such as decision trees, neural nets, Bayesian classifiers, Support Vector Machines and cased-based (aka non-parametric) learning. They include regression algorithms such as multivariate polynomial regression, MARS, Locally Weighted Regression, GMDH and neural nets. And they include other data mining operations such as clustering (mixture models, k-means and hierarchical), Bayesian networks and Reinforcement Learning…”

Bloom Filters by Example

“A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set. The price paid for this efficiency is that a Bloom filter is a probabilistic data structure: it tells us that the element either definitely is not in the set or may be in the set. The base data structure of a Bloom filter is a Bit Vector…”

Lisp for C++ programmers

“One old good friend of mine, whom I respect a lot and who is a very good C++ programmer, recently asked me to give him an example of how it’s possible make new language features in Lisp. He’s aware of Lisp’s ability to invent new syntax, and he’s also excited about C++11. So he wonders how is that possible to introduce new syntax into your language all by yourself, without having to wait for the committee to adopt the new feature.

I decided to write this article for C++ programmers, explaining core Lisp ideas. It’s a suicide; I’m sure as heck that I’ll fail. Great number of excellent publications on Lisp for beginners exist, and still there are people who cannot grasp what’s so special about it.

Nevertheless, I decided to try. Yet another article with introduction to Lisp won’t harm anybody, nor will it make Lisp even less popular. Let’s be honest: nobody reads this blog, anyway. :)…”

Intro to pandas data structures

“What follows is a fairly thorough introduction to the library. I chose to break it into three parts as I felt it was too long and daunting as one.

Part 1: Intro to pandas data structures, covers the basics of the library’s two main data structures – Series and DataFrames.

Part 2: Working with DataFrames, dives a bit deeper into the functionality of DataFrames. It shows how to inspect, select, filter, merge, combine, and group your data.

Part 3: Using pandas with the MovieLens dataset, applies the learnings of the first two parts in order to answer a few basic analysis questions about the MovieLens ratings data…”

Lua: Good, bad, and ugly parts

“I have come across several detailed lists that mention good and not-so-good parts of Lua (for example, Lua benefits, why Lua, why Lua is not more widely used, advantages of Lua, Lua good/bad, Lua vs. JavaScript, and Lua Gotchas), but I found that some of the features that tripped me or that I cared about were not listed, so I put together my own list. It is far from being comprehensive and some aspects of the language are not covered (for example, math and string libraries), but it captures the gist of my experience with the language…”

Under the hood: MySQL Pool Scanner (MPS)

“Facebook has one of the largest MySQL database clusters in the world. This cluster comprises many thousands of servers across multiple data centers on two continents.

Operating a cluster of this size with a small team is achieved by automating nearly everything a conventional MySQL Database Administrator (DBA) might do so that the cluster can almost run itself. One of the core components of this automation is a system we call MPS, short for “MySQL Pool Scanner.”

MPS is a sophisticated state machine written mostly in Python. It replaces a DBA for many routine tasks and enables us to perform maintenance operations in bulk with little or no human intervention…”

Experience with ePaxos: Systems Research using Go

“Writing our to-appear SOSP’13 paper on Egalitarian Paxos (“There is More Consensus in Egalitarian Parliaments”) was a journey made more interesting because of our choice to use Go as the implementation language.

It rocked, and it let us do some things in the evaluation that we likely wouldn’t have in C++;
It had a few drawbacks that we had to deal with, mostly with performance variation and optimization;
Our community wasn’t used to it and we got yelled at once by a reviewer (!).

[Note: While I (Dave) am writing this post, please realize that the standard professorial disclaimer applies here: When I say “we”, I really mean, “the student who did all the work”, who in this case is Iulian Moraru, a CS Ph.D. student at Carnegie Mellon. If you think “woah, that’s cool work”, he’s the one who should be credited. But if you want to yell at someone for the strong opinions expressed here or the way they’re expressed, yell at me]…”

Service orchestration and management tool

“Serf is a decentralized solution for service discovery and orchestration that is lightweight, highly available, and fault tolerant…”

“Serf relies on an efficient and lightweight gossip protocol to communicate with nodes. The Serf agents periodically exchange messages with each other in much the same way that a zombie apocalypse would occur: it starts with one zombie but soon infects everyone. In practice, the gossip is very fast and extremely efficient…”