Diagnosing Memory “Leaks” in Python

“We wrote some new code in the form of celery tasks that we expected to run for up to five minutes, and use a few hundred megabytes of memory. Rinse and repeat for a thousand different data sets. We ran through a few data sets successfully, but once we started running though ALL of them, we noticed that the memory of the celery process was continuing to grow.

In celery, each task runs in one of a fixed number of processes that persist between tasks. We assumed we had a memory leak on our hands; somehow we were leaving references around to our data structures that were remaining in memory and not being garbage collected between tasks. But how do you go about investigating exactly what is happening?

Note: Stop everything, and make sure that you’re not in DEBUG mode, assuming you’re using Django. In that mode, every database query you make will be stored in memory, which looks a lot like a memory leak…”