Pandas is a great python library for extracting and manipulating datasets. There are many functions that are used to implement it. Suppose you want to quick analysis of the dataset then there is a method for this and it is dataframe.describe(). But you cannot get detailed analysis using this function. There is another function that allows you to perform detailed analysis on pandas dataframe and it is ProfileReport(). In this entire tutorial, you will learn how to implement pandas profiling through steps.
Steps to implement Pandas Profiling
In this section, you will know all the steps to implement examples on deep analysis of the pandas data frame.
Step 1: Install pandas profile module
If you have not installed pandas-profiling on your system them you can install it using the pip command. Run the following command for that.
pip install pandas-profiling
If you have already installed the pandas profiling package then move to the second step.
Step 2: Create a Sample dataset
The second step is to create a dummy dataset where I will show the detailed analysis of the dataframe. However, you can use your dataset. But for simplicity, I will create a simple dataset. Execute the following lines of code to create it.
data = {"name":["Sahi","Abhishek","Rahul","Mani"],"gender":["male","male","male","female"],"age":[20,27,35,16]}
df = pd.DataFrame(data=data)
print(df)
Step 3: Use Pandas profiling on dataframe
Now you can create a profile report on dataframe. Just pass the dataframe inside the ProfileReport() function. It will generate a report on your dataframe.
Use the following line of code to create it.
profile = ProfileReport(df, title="Pandas Profiling Report")
profile
It will generate reports on your input dataframe. You will get all details like overview, missing values, variables information e.t.c.
Output
The above report is generated in memory. But you can save it to an HTML file by using profile.to_file(“your_html_file_name.html”). Let’s save it as people.html. Add the following line of code.
profile.to_file("people.html")
It will export to the detailed analysis of your dataframe with the name “people.html”.
When you will open the exported HTML file then you will get the output as below.
Conclusion
Profiling is the best and easy to get the deep details of your dataset. You can also use pandas.describe() function. But ProfileReport() is the best. So, these are steps to implement pandas profiling in python. I hope you have liked this tutorial. If you have any queries then you can contact us for more help.
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.