Pandas interpolate : How to Fill NaN or Missing Values

Many data analysts removes the rows or columns that have missing values. Do you know you rather than removing the rows or columns you can actually fill with the value using a single function in pandas? And that is pandas interpolate. In this entire tutorial, I will show you how to implement pandas interpolate step by step.

Steps to implement Pandas Interpolate

Step 1: Import all the necessary libraries

Let’s import the used libraries. Here In my code, I am using only the NumPy, DateTime, and pandas modules. So I will import them using the import statement. The numpy and datetime module will be used for making the dataset.

import numpy as np
import pandas as pd
import datetime

Step 2: Create a Sample Pandas Dataframe

Now the next step is to create a sample dataframe to implement pandas Interpolate. Here I am creating a time-series dataframe that has some NaN values. These values are created using np. nan. You will have to interpolate these missing values using the function.

Execute the code below to create a dataframe.

todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=10, freq='D')
columns = ['Price']
data = np.array([10,20,np.nan,30,50,20,np.nan,100,30,np.nan]).T
df = pd.DataFrame(data,index=index, columns=columns)

In the above code, I am creating 10 dates and each corresponding date price is determined with some NaN variable. When you will run the above code you will get the output as below.

Sample Time Series Dataframe with NaN values

Step 3: Apply the pandas interpolate on the dataframe

The last step is to apply the interpolate() method on the above-created data frame. If you apply the function then all the NaN values will be replaced by the values.

Execute the code below.

df.interpolate()

Output

Filling the NaN values using pandas interpolate

How does the interpolate do? Here in the above figure, you can see the NaN value is replaced by the Mean of the previous and next value of the NaN. Except for the last one. There is a method to do so and it is a method argument. The default value for it is method =”linear”. There is also another method argument value and it is polynomial.

You will get the same result as the above if you use method =”linear”.

df.interpolate(method="linear")

Output

Filling the NaN values using pandas interpolate using method=linear

And if you use the method=”polynomial” then you will get a different output.

df.interpolate(method="polynomial",order=2)

Output

Filling the NaN values using pandas interpolate using method=polynomial

Conclusion

Pandas interpolate is a very useful method for filling the NaN or missing values. In machine learning removing rows that have missing values can lead to the wrong predictive model. Therefore you can use it to improve your model. I hope you have understood the implementation of the interpolate method. If you have any queries then you can contact us for more information.

Source:

Pandas Documentation