Pyspark rename column : Implementation tricks

pyspark rename column is easily possible withColumnRenamed() function easily. All we need to pass the existing column name and the new one. In this article, we will explore the same with an example. Initially, we will create a dummy pyspark dataframe and then choose a column and rename the same. Renaming is very important in the mapping layer when we map two or more fields with similar data.

Let’s start the coding stuff-

Pyspark rename column : ( Syntax ) –

Let’s create a dummy dataframe. Here is the syntax for the same-

Step 1 – ( Prerequisites ) –

Copy the below code and run in Interpreter.

import pyspark
from pyspark.sql import SparkSession
records = [ 
    (4,"Charlee","2005","60",35000), 
    (5,"Guo","2010","40",38000)]
record_Columns = ["seq","Name","joining_year", "specialization_id","salary"]
sampleDF = spark.createDataFrame(data=records, schema = record_Columns)
sampleDF.show(truncate=False)

Step 2 –

In this step, we will use withColumnRenamed() function to rename the “salary” column to “Income” income.

sampleDF.withColumnRenamed("salary","Income").show(truncate=False)

As you may see in the output, we renamed the “salary” column with “income”.

Renaming column is a very common operation in every data engineering or data science-related task. There is some other way to achieve the same but those are not as simple as the above one. So I will recommend using the same.

I hope you must like this article. Please provide your suggestion on how can we improve this article. You may also request for article on any topic as per your choice. You may request below for the same comment or you can write back us to in an email. Please subscribe to us for more articles on Pyspark and Data Science Technology.