Many data analysts removes the rows or columns that have missing values. Do you know you rather than removing the rows or columns you can actually fill with the value using a single function in pandas? And that is pandas interpolate. In this entire tutorial, I will show you how to implement pandas interpolate step by step.
Steps to implement Pandas Interpolate
Step 1: Import all the necessary libraries
Let’s import the used libraries. Here In my code, I am using only the NumPy, DateTime, and pandas modules. So I will import them using the import statement. The numpy and datetime module will be used for making the dataset.
import numpy as np
import pandas as pd
import datetime
Step 2: Create a Sample Pandas Dataframe
Now the next step is to create a sample dataframe to implement pandas Interpolate. Here I am creating a time-series dataframe that has some NaN values. These values are created using np. nan. You will have to interpolate these missing values using the function.
Execute the code below to create a dataframe.
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=10, freq='D')
columns = ['Price']
data = np.array([10,20,np.nan,30,50,20,np.nan,100,30,np.nan]).T
df = pd.DataFrame(data,index=index, columns=columns)
In the above code, I am creating 10 dates and each corresponding date price is determined with some NaN variable. When you will run the above code you will get the output as below.
Step 3: Apply the pandas interpolate on the dataframe
The last step is to apply the interpolate() method on the above-created data frame. If you apply the function then all the NaN values will be replaced by the values.
Execute the code below.
df.interpolate()
Output
How does the interpolate do? Here in the above figure, you can see the NaN value is replaced by the Mean of the previous and next value of the NaN. Except for the last one. There is a method to do so and it is a method argument. The default value for it is method =”linear”. There is also another method argument value and it is polynomial.
You will get the same result as the above if you use method =”linear”.
df.interpolate(method="linear")
Output
And if you use the method=”polynomial” then you will get a different output.
df.interpolate(method="polynomial",order=2)
Output
Conclusion
Pandas interpolate is a very useful method for filling the NaN or missing values. In machine learning removing rows that have missing values can lead to the wrong predictive model. Therefore you can use it to improve your model. I hope you have understood the implementation of the interpolate method. If you have any queries then you can contact us for more information.
Source:
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.