pyspark save as parquet : Syntax with Example

pyspark save as parquet

pyspark save as parquet is nothing but writing pyspark dataframe into parquet format usingpyspark_df.write.parquet() function. In this article, we will first create one sample pyspark datafarme. After it, We will use the same to write into the disk in parquet format. This article will cover A-Z code for converting pyspark dataframe to parquet format.

pyspark save as parquet ( Steps ) –

before we start anything make sure before running any step here. Please install the pyspark python library. You may use the below code piece.

!pip install pyspark

Step 1 : ( Importing packages ) –

You need to import these packages. Please avoid using ( * )  in the import statements but that is not memory efficient.

import pyspark
from pyspark.sql import SparkSession

Step 2: Dummy pyspark dataframe –

In this step, we will create a simple pyspark dataframe with minimal data. Here is the code for the same.

records = [ 
    (4,"Charlee","2005","60",35000), 
    (5,"Guo","2010","40",38000)]
record_Columns = ["seq","Name","joining_year", "specialization_id","salary"]
sampleDF = spark.createDataFrame(data=records, schema = record_Columns)

In this step, we will create simple data in list of tuples. Then we will pass the columns detail and create spark dataframe with it.

Step 3: Pyspark dataframe to parquet –

Here finally comes the same where will write the pyspark dataframe to parquet format. It’s just one liner statement.

sampleDF.write.parquet("data.parquet")

Overall code –

Let’s see the output of the whole code when we run together.

import pyspark
from pyspark.sql import SparkSession
records = [ 
    (4,"Charlee","2005","60",35000), 
    (5,"Guo","2010","40",38000)]
record_Columns = ["seq","Name","joining_year", "specialization_id","salary"]
sampleDF = spark.createDataFrame(data=records, schema = record_Columns)
sampleDF.write.parquet("data.parquet")
pyspark to parquet
pyspark to parquet

Similar article –

In the same way, we save the parquet file, we can also read the parquet . Please go through this article.

Pyspark read parquet : Get Syntax with Implementation (datasciencelearner.com)

Hope you must have liked this article. In case if you want to share something related to this topic, please comment below in the comment box. You may reach out to us via email and please subscribe to us for more upcoming similar articles.

Thanks 

Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Abhishek ( Chief Editor) , a data scientist with major expertise in NLP and Text Analytics. He has worked on various projects involving text data and have been able to achieve great results. He is currently manages Datasciencelearner.com, where he and his team share knowledge and help others learn more about data science.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner