Filter a DataFrame in Pandas: Various Approaches

filter pandas dataframe featured image

Pandas is the most used library in Machine Learning or Deep Learning. You can do many things using pandas like reading CSV, manipulating data frames, export data frames to CSV or HTML or pdf and others. Even after reading data, some rows and columns you don’t want to include in the data frame. Then you have to filter the dataframe for this. In this entire post, you will learn to filter a Dataframe in Pandas using .loc[] in various approaches.

Step1: Import the Libraries

Here I am using only the pandas library, therefore, import it using the import command.

import pandas as pd

Step 2:  Read the dataset.

For the demonstration purpose, I am using the Turover Stocks Dataset. Read it using the pd.read_csv() command and select the “Security Code” as the index of the dataset.

data = pd.read_csv("data/TurnoverList.csv",index_col=["Security Code"])
data.head()

read the stock dataset

Step 3:  Filter the Dataset

After reading the dataset you can use various approaches for filtering the data in data frames. The following approaches I have mentioned are generally used in data science.

Filter Dataframe on Column Value (Labels)

As the Security code column of the dataset is the index. You can use it filtering the dataset using labels. For example, I want to get the details of a row with a 500180 code.

data.loc[500180]

filterin the dataframe using labels

It outputs all values of the columns of the corresponding code. Suppose I want only the Security Name then I will pass the “Security Name” in .loc[].

data.loc[500180,"Security Name"]

filterin the dataframe using name

You can also output the details using the list of labels in loc[]. Like in our example list of stock codes.

data.loc[[500180,500325,500112]]

filterin the dataframe using list of labels

 Using Ranges

You can filter pandas dataframe by a range of values. For example, I want to filter the dataframe from the range 500180 to 532174.

data.loc[500180:532174]

Output

Using Conditions or Boolean

Suppose I want to search for an element in Column and show the details of the matched value. Then I can use the Boolean Filtering to filter pandas dataframe by condition. Let’s say I want the details for “Reliance” Stock then I will use the following boolean filtering.

# boolean filtering
data.loc[data["Security Name"] == "RELIANCE"]

Output

boolean filtering of the dataframe

Using the Lamda Function

You can also filter anything you want using lamda function with dataframe. For example, I want to Filter all the stocks that have a price less than 100 Indian Rupees. then I will use the following code to achieve that.

# lambda function
data.loc[lambda row: row["Open"] <= 100.0]

End Notes

These are different approaches to Filter a DataFrame in Pandas using loc[]. The above example I have practically done on the stock Data. You can use these concepts on your data for filtering. If you have any query regarding this then you can contact us for more information. I will keep adding other approaches as well, therefore keep visiting the site.

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Sukesh ( Chief Editor ), a passionate and skilled Python programmer with a deep fascination for data science, NumPy, and Pandas. His journey in the world of coding began as a curious explorer and has evolved into a seasoned data enthusiast.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner