Pandas

How to Filter Dataframe by Column Value : 3 Methods

Dataframe contains many columns and within each column, there are many records or values. Let’s say you want to show rows or records only for the specific value of a particle column, then how you can do so? In this post, you will learn the various method to filter dataframe by column value.

The syntax for choosing the specific column

Before showing the records or rows for a particular column you have to first select that column and perform conditions on it to show the records.

You can select the specific column using the following syntax.

your_datafrmae["your_column_name]

Steps to filter dataframe by column value

Let’s know all the steps that will you follow to filter the dataframe.

Step 1: Import all the necessary library

The first step is to import the library. In this example, I am using only the numpy and pandas packages. The numpy package will be used for data creation and pandas for DataFrame creation and manipulation.

import numpy as np
import pandas as pd

Step 2: Create a Sample dataframe.

Now let’s create a sample dataframe in which all the methods will be implemented.

Run the below lines of code to create it.

np.random.seed(0)
df = pd.DataFrame({'col1': list('pqrstuv'), 'col2': np.random.choice(10, 7),
                   'col3':["01-01-2022","02-01-2022","03-01-2022","04-01-2022",
                           "05-01-2022","06-01-2022","07-01-2022"]})
df

Here the np.random.choice() has been used for the creation of the random number.

Step 3: Use the below method to Filter dataframe by column value

Let’s know all the methods to filter the dataframe.

Method 1: Using the simple loc[]

import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'col1': list('pqrstuv'), 'col2': np.random.choice(10, 7),
                   'col3':["01-01-2022","02-01-2022","03-01-2022","04-01-2022",
                           "05-01-2022","06-01-2022","07-01-2022"]})
df.loc[df["col2"]==3]

Output

filter dataframe using simple loc

Method 2: Using the isin()

The isin() allows you to filter dataframe based on multiple values.

import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'col1': list('pqrstuv'), 'col2': np.random.choice(10, 7),
                   'col3':["01-01-2022","02-01-2022","03-01-2022","04-01-2022",
                           "05-01-2022","06-01-2022","07-01-2022"]})
df.loc[df["col2"].isin([7,9])]

Output

filter dataframe using the isin() method

Method 3: Setting and using the index

In this method, you will first set the col2 as the index of the dataframe and then use the df.index with the condition to get the rows.

import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'col1': list('pqrstuv'), 'col2': np.random.choice(10, 7),
                   'col3':["01-01-2022","02-01-2022","03-01-2022","04-01-2022",
                           "05-01-2022","06-01-2022","07-01-2022"]})
df.set_index("col2",inplace=True)
df[df.index ==3]

Output

filter dataframe after setting the index

Conclusion

It’s very easy to get the records or rows using the column value. In this tutorial, you have used three methods. First is a simple way, second is the use isin() function and third is to retrieve rows using the index. You can use any method to filter the dataframe.

I hope you have liked these methods. If you want to add some other methods then you can contact us for more help.