How to Increase Speed of Pandas Code by 4X

Muhammad Saleh
2 min readJan 23, 2021

Photo by Marc-Olivier Jodoin on Unsplash

Pandas is the main library for processing data in Python. It’s easy to use and quite flexible when it comes to handling different sizes and types of data. It has hundreds of different functions that make working with data very easy.

The main issue with Pandas is its slowness for large datasets. But every problem has a solution and to cop up with this issue one way is to use modin.pandas library. The reason behind Pandas slowness is that it only utilizes only one core of CPU while modin.pandas spreads the workload across multiple cores available. Let’s see how to use it.

First, install modin.pandas library. This is the way of installing it in jupyter notebook enviroment.

!pip install modin[ray]

Let’s see what improvements it brings in the performance of Pandas library. For demonstration I am using this kaggle dataset. First, let’s import the normal pandas.

import pandas as pd

Now, let’s check the data loading speed of normal pandas.

%%time
df = pd.read_csv('data.csv')
### output
Wall time: 7.36 s

Now, do it modin.pandas. First, import it.

import modin.pandas as pd

Now, repeat the same operation of loading data and check the time taken.

%%time
df = pd.read_csv('data.csv')
### output
Wall time: 2.8 s

You can see significant improvement in speed of loading data and you will see this difference become bigger as you process larger datasets. One thing that I noted about modin.pandas is that it only improves the processing speed significantly in data reading, writing kind of operations and not improves much when performing statistics kind of operations. Let’s see it in practice.

# using normal pandas
%%time
df.groupby('county').count()
### output
Wall time: 1.76 s

Now, use modin.pandas and check its performance.

%%time
df.groupby('county').count()
### output
Wall time: 1.52 s

As of today, 73% of all pandas functionalities are available in modin.pandas.

So, this is a very useful library especially when you dealing with large datasets. There is so much more you can do with this library and I encourage you to practice and experiment as much as you can using the extensive information online. Best of luck!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Muhammad Saleh
Muhammad Saleh

Written by Muhammad Saleh

0 Followers

Machine Learning and Data Science Enthusiast

No responses yet

Write a response