Diagnosing Memory “Leaks” in Python

“We wrote some new code in the form of celery tasks that we expected to run for up to five minutes, and use a few hundred megabytes of memory. Rinse and repeat for a thousand different data sets. We ran through a few data sets successfully, but once we started running though ALL of them, we noticed that the memory of the celery process was continuing to grow.

In celery, each task runs in one of a fixed number of processes that persist between tasks. We assumed we had a memory leak on our hands; somehow we were leaving references around to our data structures that were remaining in memory and not being garbage collected between tasks. But how do you go about investigating exactly what is happening?

Note: Stop everything, and make sure that you’re not in DEBUG mode, assuming you’re using Django. In that mode, every database query you make will be stored in memory, which looks a lot like a memory leak…”



Exploring the Android M Developer Preview

“In this tutorial, we’ll look at how you can set up your development environment and install this early release of Android M, before exploring some of the new features included in this developer preview. We’ll take a look at how you can use the newData Binding library to cut down on boilerplate code, how your app can take advantage of Android M’s new permissions model, and how Android’s built-in app linking is set to become more powerful in Android M…”


Predict Social Network Influence with R and H2O Ensemble Learning

“H2O is an awesome machine learning framework. It is really great for data scientists and business analysts “who need scalable and fast machine learning”. H2O is completely open source and what makes it important is that works right of the box. There seems to be no easier way to start with scalable machine learning. It hast support for R, Python, Scala, Java and also has a REST API and a own WebUI. So you can use it perfectly for research but also in production environments.

H2O is based on Apache Hadoop and Apache Spark which gives it enormous power with in-memory parallel processing…”


Iterate faster on Google Play with improved beta testing

“Currently, the Google Play Developer Console lets developers release early versions of their app to selected users as an alpha or beta test before pushing updates to full production. The select user group downloads the app on Google Play as normal, but can’t review or rate it on the store. This gives you time to address bugs and other issues without negatively impacting your app listing.

Based on your feedback, we’re launching new features to more effectively manage your beta tests, and enable users to join with one click.

  • NEW! Open beta – Use an open beta when you want any user who has the link to be able to join your beta with just one click. One of the advantages of an open beta is that it allows you to scale to a large number of testers. However, you can also limit the maximum number of users who can join.
  • NEW! Closed beta using email addresses – If you want to restrict which users can access your beta, you have a new option: you can now set up a closed beta using lists of individual email addresses which you can add individually or upload as a .csv file. These users will be able to join your beta via a one-click opt-in link.
  • Closed beta with Google+ community or Google Group – This is the option that you’ve been using today, and you can continue to use betas with Google+ communities or Google Groups. You will also be able to move to an open beta while maintaining your existing testers…”


Switching Eds: Face swapping with Python, dlib, and OpenCV

“In this post I’ll describe how I wrote a short (200 line) Python script to automatically replace facial features on an image of a face, with the facial features from a second image of a face.

The process breaks down into four steps:

  • Detecting facial landmarks.
  • Rotating, scaling, and translating the second image to fit over the first.
  • Adjusting the colour balance in the second image to match that of the first.
  • Blending features from the second image on top of the first.

The full source-code for the script can be found here…”


Take-home interviews

“Programming interviews are stressful. Fundamentally, the applicant is being judged. They have to understand the question, produce a working solution in limited time, while explaining everything they are doing with no time to stop and gather their thoughts. At its worst it’s adversarial.

Some programmers find that this stress pushes them to do their best in interviews. Others find it debilitating. There are programmers with track records of solving hard problems who simply freeze when subjected to the stress of an interview. They babble. They become unable to program.

This does not mean that they are bad programmers[1]. I gave the fellow in our interview a much harder problem to do on his own time. I assumed that he’d never get back to us. The project was a lot of work. Three days later, however, I had a complete solution in my inbox. We got him back on the phone, and he was able to talk in depth about what he had done, about the underlying algorithms, and about the design trade-offs he’d made.The code was clean. He was clearly a skilled programmer…”


Profiling in Python

“Here at HumanGeo we use a lot of Python, and it is tons of fun. Python is a great language for writing beautiful and functional code amazingly fast, and it is most definitely my favorite language to use both privately and professionally. However, even though it is a wonderful language, Python can be painfully slow. Luckily, there are some amazing tools to help profile your code so that you can keep your beautiful code fast.

When I started working here at HumanGeo, I was tasked with taking a program that took many hours to run, finding the bottlenecks, and then doing whatever I could to make it run faster. I used many tools, including cProfilePyCallGraph (source), and even PyPy (an alternate, fast, interpreter for Python), to determine the best ways to optimize the program. I will go through how I used all of these programs, except for PyPy (which I ruled out to maintain interpreter consistency in production), and how they can help even the most seasoned developers find ways to better optimize their code.

Disclaimer: do not prematurely optimize! I’ll just leave this here...”