Pandas Dropna : How to remove NaN rows in Python

Pandas Dropna featured image

When you receive a dataset, there may be some NaN values. Pandas Dropna is a useful method that allows you to drop the NaN values of the dataframe.In this entire article, I will show you various examples of dealing with NaN values using drona() Pandas method.

What are the Causes of Missing Data ?

If your datasets contain missing data then the followin are the causes for getting the missing data on your dataset.

Technical Glitches 

Sometime systems are unable to propely collect data points and thus it leads to missing entries.

Human Error

If  the data points are manually recording then data might be accidently or intenitional skipped. It is the Human Error.

Integration of Data

Suppose you are intergating pr merging datasets then not all datasets might have the same fields or entries.

If you wants to remove the NaN rows then below is the syntax of it.

The syntax for the Pandas Dropna() method

your_dataframe.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Parameters explanation

axis : Requires two values 0 and 1.

how: any or all value. If you use any, then all NaN rows or columns will be removed.

thresh: Require that many non-NA values.

subset : Labels along other axis to consider.

inplace : Default is False. If it it set to True, then do operation inplace

 

Steps to Remove NaN from Dataframe using pandas dropna

Step 1: Import all the necessary libraries

In our examples, We are using NumPy for placing NaN values and pandas for creating dataframe. Let’s import them.

import numpy as np
import pandas as pd

Step 2: Create a Pandas Dataframe

In this step, I will first create a pandas dataframe with NaN values. There is a method to create NaN values. And that is numpy.nan. Execute the lines of code given below to create a Pandas Dataframe.

data = {"Date":["12/11/2020","13/11/2020","14/11/2020","15/11/2020","16/11/2020","17/11/2020"],
"Open":[1,2,np.nan,4,5,7],"Close":[5,6,7,8,9,np.nan],"Volume":[np.nan,200,300,400,500,600]}
df = pd.DataFrame(data=data)

Output

Sample Pandas Dataframe with NaN values
Sample Pandas Dataframe with NaN values

Step 3: Remove the NaN values using dropna() method

Now the last step is to remove NaN values from the dataframe. It can be done in many ways. I will show you all the examples that explains more about dropna().

Example 1: Using Simple dropna() method.

If you want to remove all the rows that have at least a single NaN value, then simply pass your dataframe inside the dropna() method.

Run the code given below.

df.dropna()

Output

Remove all rows that have at least single NaN value
Remove all rows that have at least a single NaN value

Example 2: Removing columns with at least one NaN value.

You can remove the columns that have at least one NaN value. To do so you have to pass the axis =1 or “columns”. In our dataframe all the Columns except Date, Open, Close and Volume will be removed as it has at least one NaN value.

df.dropna(axis=1)

Output

Remove all columns that have at least single NaN value
Remove all columns that have at least a single NaN value

Example 3: Remove Rows with all its value NaN.

Sometimes you have also the case where all the values of a row are NaN. And you want to remove only those rows then you can use the how parameter. To explain this example I am modifying the above original dataframe. Copy the code given below to

data = {"Date":["12/11/2020","13/11/2020","14/11/2020","15/11/2020","16/11/2020","17/11/2020"],
"Open":[1,2,np.nan,4,5,7],"Close":[5,6,np.nan,8,9,10],"Volume":[np.nan,200,np.nan,400,500,600]}
df = pd.DataFrame(data=data)

Output

Sample Pandas Datafram with NaN value in each column of row
Sample Pandas Datafram with NaN value in each column of row

Now if you apply dropna() then you will get the output as below.

df.dropna(how="all")

Output

Applying dropna() on row with all NaN values
Applying dropna() on the row with all NaN values

Example 4: Remove NaN value on Selected column

Suppose I want to remove the NaN value on one or more columns. To do this task you have to pass the list of columns and assign them to the subset parameter. It removes rows that have NaN values in the corresponding columns. I will use the same dataframe that was created in Step 2.

Run the code below

df.dropna(subset=["Open","Volume"])

Output

Applying dropna() on Selected Columns
Applying dropna() on Selected Columns

 

After removing NaN values from the dataframe you have to finally modify your dataframe. It can be done by passing the inplace =True inside the dropna() method.

df.dropna(inplace=True)
pandas dropna
pandas dropna

Conclusion

That’s all for now. These are the best examples I have coded for you. I hope you have understood how to remove NaN from your dataset. Even if you have any queries then you can contact us.

Source:

Pandas Dropna Offical Documentation

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Sukesh ( Chief Editor ), a passionate and skilled Python programmer with a deep fascination for data science, NumPy, and Pandas. His journey in the world of coding began as a curious explorer and has evolved into a seasoned data enthusiast.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner