Pyspark add new row to dataframe : With Syntax and Example

Pyspark Add New row to dataframe

Pyspark allows you to add a new row to dataframe and is possible by union operation in dataframes. We can create a new dataframe from the row and union them. In this article, we will first simply create a new dataframe and then create a different dataframe with the same schema/structure and after it. We will union both of them simple.

Pyspark add new row to dataframe – ( Steps )-

Firstly we will create a dataframe and lets call it master pyspark dataframe. Here is the code for the same-

Step 1: ( Prerequisite)

We have to first create a SparkSession object and then we will define the column and generate the dataframe. Here is the code for the same.

spark = SparkSession.builder.getOrCreate()
columns = ['Identifier', 'Value', 'Extra Discount']
vals = [(1, 150, 0), (2, 160, 12)]
df = spark.createDataFrame(vals, columns)
adding new row to Pyspark dataframe
adding new row to Pyspark dataframe

Step 2:

In the second step, we will generate the second dataframe with one row. Here is the code for the same.

newRow = spark.createDataFrame([(3,205,7)], columns)

Step 3 :

This is the final step. Here we will union both the dataframes.  Please run the below code –

new_df = df.union(newRow)

Once we run the above code, You will get the below output.

final output row adding into pyspark dataframe
final output row adding into pyspark dataframe

Conclusion –

In real scenarios, Especially data mocking or synthetic data generation. We need to perform this step. When we generate data and after it, we need to union the same into original data. Although in the same article we only used a single row but we can union multiple rows in the same way.

I hope you liked the article If you need any further explanation on a similar topic. Please feel free to connect back to us. You may comment below or write an email to us as well. Please subscribe to us for similar articles on Pyspark , python , Machine Learning, and Deep Learning topics.


Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Abhishek ( Chief Editor) , a data scientist with major expertise in NLP and Text Analytics. He has worked on various projects involving text data and have been able to achieve great results. He is currently manages, where he and his team share knowledge and help others learn more about data science.
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner