Merging two columns in Pandas can be a tedious task if you don’t know the Pandas merging concept. You can easily merge two different data frames easily. But on two or more columns on the same data frame is of a different concept. In this entire post, you will learn how to merge two columns in Pandas using different approaches.
Step 1: Import the Necessary Packages
Numpy and Pandas Packages are only required for this tutorial, therefore I am importing it.
import pandas as pd
import numpy as np
Step 2: Create a Dataframe
For the demonstration purpose, I am creating a Dataframe manually. You can apply the same concept to your dataframe.
missing = np.nan
actors_name = ["Tom Cruise","Hugh Jackman","Brad Pitt","Johnny Depp","Leonardo DiCaprio"]
actor_age = [57,missing,51,missing,44]
actor_age_revised =[missing,55,missing,56,missing]
df = pd.DataFrame({"name":actors_name,"age1":actor_age,"revised_age":actor_age_revised})
Here the dataframe contains “name“, “age1” and “revised_age” columns, and also some rows have missing values. I have created it for showing the merge process on the columns.
Step 3: Apply the approaches
In this step apply these methods for completing the merging task.
Approach 1: Using the “+” Operator
You can do the simple mathematical calculation on the two columns if it contains missing values of numeric type. Like this.
df = df.fillna(0)
df["age"] = (df["age1"] + df["revised_age"]).astype("int")
df = df[["name","age"]]
df
First, you are filling in the missing values and then add the values of the two columns and output the result in the age column.
Output
Approach 2: Using the pop() method
df["age"] = df.pop("age1").fillna(df.pop("revised_age")).astype(int)
df
You can merge the columns using the pop() method. In this, you are popping the values of “age1” columns and filling it with the popped values of the other columns “revised_age“. You will get the output below.
Approach 3: Using the combine_first() method
The other method for merging the columns is dataframe combine_first() method. Use the following code.
df["age"] = df["age1"].combine_first(df["revised_age"]).astype(int)
df = df[["name","age"]]
df
The above code combines the “age1” columns with the “revised_age” and assigns it to the df[“age”] column.
Output:
Approach 4: Using Numpy
You can directly merge the “age1” column using the numpy.where() method. We are replacing all the NaN values with the “revised_age” column and dropping the “revised_age” column. Use the code below.
df["age1"] = np.where(df["age1"].isna(),df["revised_age"],df["age1"]).astype("int")
df =df.drop("revised_age",axis=1)
df
These are some approaches to merge two columns in a Dataframe. You can apply the simple addition approach if the data contains numeric values. Otherwise, use other approaches. Hope you have learned it easily. If you have any queries please contact us for more help.
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.