pyspark save as parquet is nothing but writing pyspark dataframe into parquet format usingpyspark_df.write.parquet() function. In this article, we will first create one sample pyspark datafarme. After it, We will use the same to write into the disk in parquet format. This article will cover A-Z code for converting pyspark dataframe to parquet format.
pyspark save as parquet ( Steps ) –
before we start anything make sure before running any step here. Please install the pyspark python library. You may use the below code piece.
!pip install pyspark
Step 1 : ( Importing packages ) –
You need to import these packages. Please avoid using ( * ) in the import statements but that is not memory efficient.
import pyspark
from pyspark.sql import SparkSession
Step 2: Dummy pyspark dataframe –
In this step, we will create a simple pyspark dataframe with minimal data. Here is the code for the same.
records = [
(4,"Charlee","2005","60",35000),
(5,"Guo","2010","40",38000)]
record_Columns = ["seq","Name","joining_year", "specialization_id","salary"]
sampleDF = spark.createDataFrame(data=records, schema = record_Columns)
In this step, we will create simple data in list of tuples. Then we will pass the columns detail and create spark dataframe with it.
Step 3: Pyspark dataframe to parquet –
Here finally comes the same where will write the pyspark dataframe to parquet format. It’s just one liner statement.
sampleDF.write.parquet("data.parquet")
Overall code –
Let’s see the output of the whole code when we run together.
import pyspark
from pyspark.sql import SparkSession
records = [
(4,"Charlee","2005","60",35000),
(5,"Guo","2010","40",38000)]
record_Columns = ["seq","Name","joining_year", "specialization_id","salary"]
sampleDF = spark.createDataFrame(data=records, schema = record_Columns)
sampleDF.write.parquet("data.parquet")
Similar article –
In the same way, we save the parquet file, we can also read the parquet . Please go through this article.
Pyspark read parquet : Get Syntax with Implementation (datasciencelearner.com)
Hope you must have liked this article. In case if you want to share something related to this topic, please comment below in the comment box. You may reach out to us via email and please subscribe to us for more upcoming similar articles.
Thanks
Data Science Learner Team
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.