Pandas Groupby :How to use it in Python

Pandas Groupby Function

As you already know pandas is the best python packages for creating and manipulating dataframes. You can do many tasks using it like reading CSV, excel sheets, exporting files, deleting data e.t.c. But do you know that you can also group your data using the pandas groupby function ? Yes. In this entire tutorial, you will how to use Pandas groupby function to group your entire dataframe.

Steps to Implement Pandas Groupby Function

In this section, you will know all the steps required to implement the pandas groupby() function.

Step 1: Import the necessary libraries

In our example, I am using only the pandas package. So let’s import it using the import statement.

import pandas as pd

Step 2: Create  a Sample Dataframe

For the implementation, part lets create a sample dataframe. Here I am creating a simple dataframe for the sake of simplicity. So Use the following lines of code to do so.

import pandas as pd

data = {
    "column1":["A","B","C","A","B","D","E"],
    "column2":[1,2,5,1,4,6,8],
    "column3":[10,20,70,60,60,10,20]
}
df = pd.DataFrame(data)
print(df)

Output

Sample Dataframe for pandas groupby method
Sample Dataframe for pandas groupby method

Step 3: Use the following methods

The last step is to use the various method to implement dataframe.groupby() method.

Method 1: Pandas Groupby on Single Column

In this method, I will group the data by applying dataframe.groupby() method on a single column. Using the above same sample dataframe, when you will execute the following lines of code you will get the desired output.

import pandas as pd

data = {
    "column1":["A","B","C","A","B","D","E"],
    "column2":[1,2,5,1,4,6,8],
    "column3":[10,20,70,60,60,10,20]
}
df = pd.DataFrame(data)
df2 = df.groupby(["column1"])
print(df2.first())

Here you can see I am grouping the dataframe on column1. It will list out the unique columns for each corresponding value of column1.

Output

Pandas groupby on single column
groupby on single column

Method 2: Pandas Groupby on Multiple Column

Just like grouping the dataframe on a single column. You can also group the dataframe on multiple columns. For example, I want to group my dataframe using the two columns column1 and column2. I will use the following lines of code to do so.

import pandas as pd

data = {
    "column1":["A","B","C","A","B","D","E"],
    "column2":[1,2,5,1,4,6,8],
    "column3":[10,20,70,60,60,10,20]
}
df = pd.DataFrame(data)
df2 = df.groupby(["column1","column3"])
print(df2.first())

You can see the I am passing two columns on groupby() function. When you will run the code you will get the following output.

Output

Pandas groupby on multiple columns
groupby on multiple columns

Conclusion

These are the methods for implementing dataframe.groupby() method. You can also import the CSV dataset and then use groupby() function to group certain columns. I hope you have liked this tutorial and has solved your queries. If you have still doubt then you can contact us for more help.

Source:

pandas.DataFrame.groupby

 

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Sukesh ( Chief Editor ), a passionate and skilled Python programmer with a deep fascination for data science, NumPy, and Pandas. His journey in the world of coding began as a curious explorer and has evolved into a seasoned data enthusiast.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner