pyspark rename column is easily possible withColumnRenamed() function easily. All we need to pass the existing column name and the new one. In this article, we will explore the same with an example. Initially, we will create a dummy pyspark dataframe and then choose a column and rename the same. Renaming is very important in the mapping layer when we map two or more fields with similar data.
Let’s start the coding stuff-
Pyspark rename column : ( Syntax ) –
Let’s create a dummy dataframe. Here is the syntax for the same-
Step 1 – ( Prerequisites ) –
Copy the below code and run in Interpreter.
import pyspark from pyspark.sql import SparkSession records = [ (4,"Charlee","2005","60",35000), (5,"Guo","2010","40",38000)] record_Columns = ["seq","Name","joining_year", "specialization_id","salary"] sampleDF = spark.createDataFrame(data=records, schema = record_Columns) sampleDF.show(truncate=False)
Step 2 –
In this step, we will use withColumnRenamed() function to rename the “salary” column to “Income” income.
As you may see in the output, we renamed the “salary” column with “income”.
Renaming column is a very common operation in every data engineering or data science-related task. There is some other way to achieve the same but those are not as simple as the above one. So I will recommend using the same.
I hope you must like this article. Please provide your suggestion on how can we improve this article. You may also request for article on any topic as per your choice. You may request below for the same comment or you can write back us to in an email. Please subscribe to us for more articles on Pyspark and Data Science Technology.
Similar Articles :
We have started a series on Pyspark and Data Engineering stuffs. Here we try to make syntax too user-friendly. Especially for beginners its very good to start from here. It will cover most of the basics related to this topic.
Data Science Learner Team
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.