When you receive a dataset, there may be some NaN values. Pandas Dropna is a useful method that allows you to drop the NaN values of the dataframe.In this entire article, I will show you various examples of dealing with NaN values using drona() Pandas method.
If your datasets contain missing data then the followin are the causes for getting the missing data on your dataset.
Technical Glitches
Sometime systems are unable to propely collect data points and thus it leads to missing entries.
If the data points are manually recording then data might be accidently or intenitional skipped. It is the Human Error.
Suppose you are intergating pr merging datasets then not all datasets might have the same fields or entries.
If you wants to remove the NaN rows then below is the syntax of it.
your_dataframe.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)Parameters explanation
axis : Requires two values 0 and 1. how: any or all value. If you use any, then all NaN rows or columns will be removed. thresh: Require that many non-NA values. subset : Labels along other axis to consider. inplace : Default is False. If it it set to True, then do operation inplace
In our examples, We are using NumPy for placing NaN values and pandas for creating dataframe. Let’s import them.
import numpy as np
import pandas as pdIn this step, I will first create a pandas dataframe with NaN values. There is a method to create NaN values. And that is numpy.nan. Execute the lines of code given below to create a Pandas Dataframe.
data = {"Date":["12/11/2020","13/11/2020","14/11/2020","15/11/2020","16/11/2020","17/11/2020"],
"Open":[1,2,np.nan,4,5,7],"Close":[5,6,7,8,9,np.nan],"Volume":[np.nan,200,300,400,500,600]}
df = pd.DataFrame(data=data)Output
Now the last step is to remove NaN values from the dataframe. It can be done in many ways. I will show you all the examples that explains more about dropna().
If you want to remove all the rows that have at least a single NaN value, then simply pass your dataframe inside the dropna() method.
Run the code given below.
df.dropna()Output
You can remove the columns that have at least one NaN value. To do so you have to pass the axis =1 or “columns”. In our dataframe all the Columns except Date, Open, Close and Volume will be removed as it has at least one NaN value.
df.dropna(axis=1)Output
Sometimes you have also the case where all the values of a row are NaN. And you want to remove only those rows then you can use the how parameter. To explain this example I am modifying the above original dataframe. Copy the code given below to
data = {"Date":["12/11/2020","13/11/2020","14/11/2020","15/11/2020","16/11/2020","17/11/2020"],
"Open":[1,2,np.nan,4,5,7],"Close":[5,6,np.nan,8,9,10],"Volume":[np.nan,200,np.nan,400,500,600]}
df = pd.DataFrame(data=data)Output
Now if you apply dropna() then you will get the output as below.
df.dropna(how="all")Output
Suppose I want to remove the NaN value on one or more columns. To do this task you have to pass the list of columns and assign them to the subset parameter. It removes rows that have NaN values in the corresponding columns. I will use the same dataframe that was created in Step 2.
Run the code below
df.dropna(subset=["Open","Volume"])Output
After removing NaN values from the dataframe you have to finally modify your dataframe. It can be done by passing the inplace =True inside the dropna() method.
df.dropna(inplace=True)That’s all for now. These are the best examples I have coded for you. I hope you have understood how to remove NaN from your dataset. Even if you have any queries then you can contact us.
Source: