Suppose you have two datasets and each dataset has a column which is an index column. Now you want to do pandas merge on index column. How to achieve this. In this tutorial, you will learn all the methods to merge pandas dataframe on index.
Steps to implement Pandas Merge on Index
Step 1: Import the required libraries
Here I am using only NumPy, DateTime, and pandas libraries for dataframe creation and merging. Let’s import all of them.
import numpy as np
impot pandas as pd
import datatime
Step 2: Create Dataframes
For the implementation part, We require two dataframes. Let’s create them. Execute the following lines of code to create them.
Dataframe 1
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=10, freq='D')
columns = ['A','B']
data = np.array([np.arange(10)]*2).T
df1 = pd.DataFrame(data,index=index, columns=columns
)
Output
Dataframe 2
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=5, freq='D')
columns = ['C']
data = np.array([np.arange(5)]).T
df2 = pd.DataFrame(data,index=index, columns=columns)
Output
Both the dataframes are time-series data with the date as the index. I am not going to explain what the code is doing. Otherwise, this post will become long. I am just creating two dataframes only.
In the next step, you will look at various examples to implement pandas merge on the index.
Step 3: Follow the various examples to do Pandas Merge on Index
EXAMPLE 1: Using the Pandas Merge Method
In pandas, there is a function pandas.merge() that allows you to merge two dataframes on the index. Execute the following code to merge both dataframes df1 and df2.
pd.merge(df1, df2, left_index=True, right_index=True)
Here I am passing four parameters. The first and second parameters are the dataframes to merge. And the third and fourth are left_index and right_index respectively. The left_index uses the index from the left dataframe(df1) and the right_index uses the index from the right dataframe(df2) as the join key. You can read more about the parameters on panda.merge() documentation.
Output
EXAMPLE 2: Using the Pandas Join Method
The second method to merge two dataframes is using the pandas.DataFrame.join method. Just use the dot operator on the dataframe you to merge like below.
join_df = df1.join(df2)
join_df
The merged dataframe will also contain NaN values depending upon the df inside the join() method. For example, If I will use the above code then the merged dataframe will also have NaN values. But If I will use df2.join(df1), then the output will be the same as the above Example 1.
Output
EXAMPLE 3: Pandas Merge on Index using concat() method
Another method to implement pandas merge on index is using the pandas.concat() method. Just pass both the dataframes with the axis value.
pd.concat([df1, df2], axis=1)
Here the axis value tells how to concate values. Like to merge the columns I am setting the axis to 1. Otherwise, for rows, you will use axis =0.
Output
In the above figure, you can see NaN values also comes. To remove it you have to use the dropna() method. Run the below code to remove them.
concat_df= pd.concat([df1, df2], axis=1)
concat_df.dropna()
Output
Conclusion
These are the example of Pandas Merge on Index. The first example was very easy just call the function. But in the second and third examples, there may be NaN values in the merged dataframe. You can remove them using the dropna() method.
Hope you have understood all the above examples. If you have any query then you can contact us on our Offical Facebook Page.
Source:
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.