What is Deep Learning?



Deep Learning is often understood as a new domain of machine learning research that is dealing with learning multiple levels of representation and abstraction that can be discovered in structured data as well as in unstructured data. However, many of the deep learning algorithms are rooted in the domain of artificial intelligence (AI). They only seem new as they take full advantage of the new computational re- sources and recent developments. Deep learning algorithms are implementing supervised learning algorithms, as well as unsupervised learning algorithms. For a good introduction to Deep Learning from a computational perspective, the interested reader is referred to tutorials like [1] or [2]…”


mnesia + leveldb: liberating mnesia from the limitations of DETS



“Mnesia offers various database features, but restricts users to a few storage engines with substantial limitations. This talk describes mnesia_ext, an extension which allows arbitrary storage engines to be plugged into mnesia, and how Klarna used this to migrate parts of its database to LevelDB. We will also talk about our experiences with LevelDB, and some improvements we have made…”

slides: 143415340626199euc2015mnesialeveldb

How we use gevent to go fast



“Not too long ago we were tackling the challenge of fixing a legacy Python system and converting a two-year old single-threaded codebase with hundreds of thousands of lines of code to a multi-threaded codebase. To save us from rewriting everything from scratch, we went with gevent to make the program greenlet-safe.

With the update, Pinners can spend less time waiting and more time collecting and discovering the things they love on the site.

Here’s a look at how it all went down…”


Google Brain’s Co-inventor Tells Why He’s Building Chinese Neural Networks



“To chat with Andrew Ng I almost have to tackle him. He was getting off stage at Re:Work’s Deep Learning Summit in San Francisco when a mob of adoring computer scientists descended on (clears throat) the Stanford deep learning professor, former “Google Brain” leader, Coursera founder and now chief scientist at Chinese web giant Baidu.

Deep learning has become one of computing’s hottest topics, in large part due to the last decade of work by Geoff Hinton, now a top Googler. The idea is that if you feed a computer lots of images of, say, dogs, the computer will eventually learn to recognize canines. And if we can teach machines that, technophiles and businesspeople alike hope, machines will soon — truly, in the human sense — understand language and images. This approach is being applied to aims as disparate as having computers spot tumors and travel guides that recognize the mood of a restaurant.

Ng and I chatted about the challenges he faces leading the efforts for “China’s Google” to understand our world through deep learning. Ng insists that Baidu is “only interested in tech that can influence 100 million users.” Despite the grand visions, he is very friendly and soft-spoken, the kind of person you’d feel really guilty interrupting…”


Scalable user load testing tool written in Python



“Locust is an easy-to-use, distributed, user load testing tool. Intended for load testing web sites (or other systems) and figuring out how many concurrent users a system can handle.

The idea is that during a test, a swarm of locusts will attack your website. The behavior of each locust (or test user if you will) is defined by you and the swarming process is monitored from a web UI in real-time. This will help you battle test and identify bottlenecks in your code before letting real users in.

Locust is completely event based, and therefore it’s possible to support thousands of concurrent users on a single machine. In contrast to many other event-based apps it doesn’t use callbacks. Instead it uses light-weight processes, through gevent. Each locust swarming your site is actually running inside it’s own process (or greenlet, to be correct). This allows you to write very expressive scenarios in Python without complicating your code with callbacks…”


Thoughts on Time-series Databases



“Preetam “recently” blogged about catena, a time-series metric store. There was another blog post about benchmarking boltdb by a Fog Creek engineer, also looking to write a time series database. This is something of a pattern in the Go community, which already boasts seriesly, InfluxDB, and prometheus; there are almost certainly others.

Time series data has been de rigueur at least since the Etsy’s seminal blog post on StatsD, though in reality that was just an inflection point. Time series modeling and graphing predates computer systems, but they have been a popular way of tracking and visualizing systems and networking data since at least the early 90s with MRTG. A few factors are converging now to make these kinds of systems more important: “Big Data” is getting much, much bigger; virtualization and containerization has increased the number of independent “nodes” for a typical distributed application; and the economies of the cloud have put the brakes on the types of performance increases typically attributed to “Moore’s Law.”

This topic is relevant to my work at Datadog, and I’ve been thinking about it for a long time now. I’ve wanted to collect my thoughts somewhere for a while, because some of them are identical to those expressed in other recent projects, and others are antithetical to them. I figured this would make my input at worst interesting.

For a primer on this subject, please read Baron’s Time-Series Database Requirements. There’s a reason that most other recent articles cite it; it contains a brief but complete description of the problem, the requirements for many large-scale time-series users, and some surprises for the uninitiated…”


Three Useful Python Libraries for Startups


Whitenoise handles Gzipping your content and setting far-future cache headers on content. With a trivial amount of work, you can configure your app to automatically append a hash to each of your static files every time you deploy changes, so that you can set the cache headers as so…”

Phonenumbers: Validating phone numbers is not easy, and there are tons of valid formats that make using regular expressions impossible. Moreover, even if a number passes a regular expression for formatting, it may still not be valid…”

Pdfkit makes it simple to generate PDFs from html. Why would one use this? Let’s say you have an invoice page in your app – you can use the same code to render that page to render a downloadable PDF for customers or your own records…”

“Python-dateutil: Numerous date utilities for calculating differences, etc. The most useful of these is a resilient date parser:…”


Running Lisp in Production


“At Grammarly, the foundation of our business, our core grammar engine, is written in Common Lisp. It currently processes more than a thousand sentences per second, is horizontally scalable, and has reliably served in production for almost 3 years.

We noticed that there are very few, if any, accounts of how to deploy Lisp software to modern cloud infrastructure, so we thought that it would be a good idea to share our experience. The Lisp runtime and programming environment provides several unique, albeit obscure, capabilities to support production systems (for the impatient, they are described in the final chapter)…”


“pip -t”: A simple and transparent alternative to virtualenv


“Often, virtualenv is overkill for the basic task of installing project dependencies and keeping them isolated. We present a simple alternative consisting of:

  1. adding ./.pip to your PYTHONPATH
  2. using pip install -t .pip to install modules locally
  3. executing python from your project’s root directory…”


Cache-friendly binary search



“High-speed memory caches present in modern computer architectures favor data structures with good locality of reference, i.e. the property by which elements accessed in sequence are located in memory addresses close to each other. This is the rationale behind classes such as Boost.Container flat associative containers, that emulate the functionality of standard C++ node-based associative containers while storing the elements contiguously (and in order). This is an example of how binary search works in a boost::container::flat_set with elements 0 trough 30…”



Get every new post delivered to your Inbox.

Join 1,350 other followers