JSON to parquet : How to perform in Python with example ?

JSON to Parquet

JSON to parquet conversion is possible in multiple ways but I prefer via dataframe. Firstly convert JSON to dataframe and then to parquet file.  In this article, we will explore the complete same process with an easy example.

JSON to parquet ( Conversion ) –

Let’s break this into steps.

Step 1: Prerequisite JSON object creation –

Here is the code for dummy json creation which we will use for converting into parquet.

record = '''
          {
              "0":{
                  "Identity": "Sam",
                  "Age": "19"
              },
              "1":{
                  "Identity": "Tom",
                  "Age": "14"
              },
              "2":{
                  "Identity": "Mac",
                  "Age":"11"
              }
          }
    '''

Step 2: Converting JSON to dataframe –

We will first import pandas framework and then load the json. After it, We will convert the same into pandas dataframe.

import pandas as pd
record = '''
          {
              "0":{
                  "Identity": "Sam",
                  "Age": "19"
              },
              "1":{
                  "Identity": "Tom",
                  "Age": "14"
              },
              "2":{
                  "Identity": "Mac",
                  "Age":"11"
              }
          }
    '''

df = pd.read_json(record, orient ='index')
print(df)

This read_json() function from Pandas helps convert JSON to pandas dataframe.

json dataframe
json dataframe

Step 3 : Dataframe to parquet file –

This is the last step, Here we will create parquet file from dataframe.  We can use to_parquet() function for converting dataframe to parquet file. Here is the code for the same.

df.to_parquet("out.parquet")

When we integrate this piece of code with above master code. We get the parquet file.

JSON to parquet file
JSON to parquet file

Limitations –

All the JSON does not follow the structure which we can convert to dataframe. Hence this approach will only applicable with JSON format where is convertible to data frame.

JSON to Parquet in Pyspark –

Just like pandas, we can first create Pyspark Dataframe using JSON. IN order to do that here is the code-

df = spark.read.json("sample.json")

Once we have pyspark dataframe inplace, we can convert the pyspark dataframe to parquet using below way.

pyspark_df.write.parquet("data.parquet")

Conclusion –

Parquet file is a more popular file format for a table-like data structure. Also, it offers fast data processing performance than CSV file format. In the same way, Parquet file format contains the big volume of data than the CSV file format. I hope this article must help our readers, please feel free to put any concerns related to this topic. Same you can either comment or write back to us via email etc. Please subscribe to us for more related topics.

Thanks 

Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Abhishek ( Chief Editor) , a data scientist with major expertise in NLP and Text Analytics. He has worked on various projects involving text data and have been able to achieve great results. He is currently manages Datasciencelearner.com, where he and his team share knowledge and help others learn more about data science.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner