Merging two columns in Pandas can be a tedious task if you don’t know the Pandas merging concept. You can easily merge two different data frames easily. But on two or more columns on the same data frame is of a different concept. In this entire post, you will learn how to merge two columns in Pandas using different approaches.
Numpy and Pandas Packages are only required for this tutorial, therefore I am importing it.
import pandas as pd
import numpy as np
For the demonstration purpose, I am creating a Dataframe manually. You can apply the same concept to your dataframe.
missing = np.nan
actors_name = ["Tom Cruise","Hugh Jackman","Brad Pitt","Johnny Depp","Leonardo DiCaprio"]
actor_age = [57,missing,51,missing,44]
actor_age_revised =[missing,55,missing,56,missing]
df = pd.DataFrame({"name":actors_name,"age1":actor_age,"revised_age":actor_age_revised})
Here the dataframe contains “name“, “age1” and “revised_age” columns, and also some rows have missing values. I have created it for showing the merge process on the columns.
In this step apply these methods for completing the merging task.
You can do the simple mathematical calculation on the two columns if it contains missing values of numeric type. Like this.
df = df.fillna(0)
df["age"] = (df["age1"] + df["revised_age"]).astype("int")
df = df[["name","age"]]
df
First, you are filling in the missing values and then add the values of the two columns and output the result in the age column.
Output
df["age"] = df.pop("age1").fillna(df.pop("revised_age")).astype(int)
df
You can merge the columns using the pop() method. In this, you are popping the values of “age1” columns and filling it with the popped values of the other columns “revised_age“. You will get the output below.
The other method for merging the columns is dataframe combine_first() method. Use the following code.
df["age"] = df["age1"].combine_first(df["revised_age"]).astype(int)
df = df[["name","age"]]
df
The above code combines the “age1” columns with the “revised_age” and assigns it to the df[“age”] column.
Output:
You can directly merge the “age1” column using the numpy.where() method. We are replacing all the NaN values with the “revised_age” column and dropping the “revised_age” column. Use the code below.
df["age1"] = np.where(df["age1"].isna(),df["revised_age"],df["age1"]).astype("int")
df =df.drop("revised_age",axis=1)
df
These are some approaches to merge two columns in a Dataframe. You can apply the simple addition approach if the data contains numeric values. Otherwise, use other approaches. Hope you have learned it easily. If you have any queries please contact us for more help.