How to reduce the memory used by Pandas DataFrames

Thomas Dickson



This post describes a useful script for reducing the memory usage of a pandas DataFrame. There have been a few blog posts and Stack Overflow questions in this area such as here and here.

You might want the memory of a dataframe to be reduced if you want to work with more data or you want to speed up what you are doing. A related problem is about getting the most amount of data you can out of your database (or data warehouse) - this article covers a few different ways you can use pandas to load lots of data..

I wanted to include this snippet here as it’s a working version of the functionality that I’m happy with. This function loads a dataframe and iterates over each column to identify whether a data type with a reduced memory requirement can describe the existing data. It returns a modified version of the dataframe.