PySpark

Pyspark add new row to dataframe : With Syntax and Example

Pyspark allows you to add a new row to dataframe and is possible by union operation in dataframes. We can create a new dataframe from the row and union them. In this article, we will first simply create a new dataframe and then create a different dataframe with the same schema/structure and after it. We will union both of them simple.

Pyspark add new row to dataframe – ( Steps )-

Firstly we will create a dataframe and lets call it master pyspark dataframe. Here is the code for the same-

Step 1: ( Prerequisite)

We have to first create a SparkSession object and then we will define the column and generate the dataframe. Here is the code for the same.

spark = SparkSession.builder.getOrCreate()
columns = ['Identifier', 'Value', 'Extra Discount']
vals = [(1, 150, 0), (2, 160, 12)]
df = spark.createDataFrame(vals, columns)
df.show()
adding new row to Pyspark dataframe

Step 2:

In the second step, we will generate the second dataframe with one row. Here is the code for the same.

newRow = spark.createDataFrame([(3,205,7)], columns)

Step 3 :

This is the final step. Here we will union both the dataframes.  Please run the below code –

new_df = df.union(newRow)
new_df.show()

Once we run the above code, You will get the below output.

final output row adding into pyspark dataframe

Conclusion –

In real scenarios, Especially data mocking or synthetic data generation. We need to perform this step. When we generate data and after it, we need to union the same into original data. Although in the same article we only used a single row but we can union multiple rows in the same way.

I hope you liked the article If you need any further explanation on a similar topic. Please feel free to connect back to us. You may comment below or write an email to us as well. Please subscribe to us for similar articles on Pyspark , python , Machine Learning, and Deep Learning topics.

Thanks 

Data Science Learner Team