Pyspark add new row to dataframe is possible by union operation in dataframes. We can create a new dataframe from the row and union them. In this article, we sill first simply create a new dataframe and then create a different dataframe with the same schema/structure and after it. We will union both of them simple.
Pyspark add new row to dataframe – ( Steps )-
Firstly we will create a dataframe and lets call it master pyspark dataframe. Here is the code for the same-
Step 1: ( Prerequisite)
We have to first create a SparkSession object and then we will define the column and generate the dataframe. Here is the code for the same.
spark = SparkSession.builder.getOrCreate() columns = ['Identifier', 'Value', 'Extra Discount'] vals = [(1, 150, 0), (2, 160, 12)] df = spark.createDataFrame(vals, columns) df.show()
In the second step, we will generate the second dataframe with one row. Here is the code for the same.
newRow = spark.createDataFrame([(3,205,7)], columns)
Step 3 :
This is the final step. Here we will union both the dataframes. Please run the below code –
new_df = df.union(newRow) new_df.show()
Once we run the above code, You will get the below output.
In real scenarios, Especially data mocking or synthetic data generation. We need to perform this step. When we generate data and after it, we need to union the same into original data. Although in the same article we only used a single row but we can union multiple rows in the same way.
I hope you liked the article If you need any further explanation on a similar topic. Please feel free to connect back to us. You may comment below or write an email to us as well. Please subscribe to us for similar articles on Pyspark , python , Machine Learning, and Deep Learning topics.
Data Science Learner Team
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.