Pyspark drop column : How to performs ?

Pyspark drop column

pyspark drop column is possible with drop() function in pyspark. The important factor is to import “col” module for the same. This “col” module is the part of pyspark.sql.functions package. Well! In this article, We will explore the syntax of the drop function with an example.

pyspark drop column : ( Example)-

It will make more sense if you see the use of drop() pyspark function practically. For the same, the prerequisites are to create a dummy pyspark  dataframe. Then we will drop column from that pyspark dataframe.  Now here we go –

Prerequisites :

Use the below code for creating dummmy pyspark daraframe.

import pyspark
from pyspark.sql import SparkSession
records = [ 
    (4,"Charlee","2005","60",35000), 
    (5,"Guo","2010","40",38000)]
record_Columns = ["seq","Name","joining_year", "specialization_id","salary"]
sampleDF = spark.createDataFrame(data=records, schema = record_Columns)
sampleDF.show(truncate=False)
drop() function - dummy dataframe
drop() function – dummy dataframe

Use drop() function in pyspark –

Let’s say if we want to drop “specialization_id” from the above dataframe. You may use the below code part.

from pyspark.sql.functions import col
sampleDF=sampleDF.drop(col("specialization_id"))
sampleDF.show(truncate=False)
pyspark drop column
pyspark drop column

In this above section, we have seen how easy is to drop any column in dataframe.

Dropping multiple columns-

Hey! it’s so simple, In the place of a single column, we can pass multiple entries. Here is an example of the same. Let’s understand with the above example. Suppose if we want to drop the “salary” column along with the “specialization_id” column.  Check out the code with output-

from pyspark.sql.functions import col
sampleDF=sampleDF.drop("specialization_id","salary")
sampleDF.show(truncate=False)
multi column drop
multi column drop

Hope you have found this article informative on this column-dropping topic. We always try to keep the content simple. We also try to add the output section for more clarity on the topic. Still, if you have any doubt related to the same, please feel free to contact us. You may comment below or write back to us. Keep reading our articles !! and please subscribe for more updates on pyspark and data engineering.

Thanks
Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Abhishek ( Chief Editor) , a data scientist with major expertise in NLP and Text Analytics. He has worked on various projects involving text data and have been able to achieve great results. He is currently manages Datasciencelearner.com, where he and his team share knowledge and help others learn more about data science.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner