pyspark drop column is possible with drop() function in pyspark. The important factor is to import “col” module for the same. This “col” module is the part of pyspark.sql.functions package. Well! In this article, We will explore the syntax of the drop function with an example.
pyspark drop column : ( Example)-
It will make more sense if you see the use of drop() pyspark function practically. For the same, the prerequisites are to create a dummy pyspark dataframe. Then we will drop column from that pyspark dataframe. Now here we go –
Prerequisites :
Use the below code for creating dummmy pyspark daraframe.
import pyspark
from pyspark.sql import SparkSession
records = [
(4,"Charlee","2005","60",35000),
(5,"Guo","2010","40",38000)]
record_Columns = ["seq","Name","joining_year", "specialization_id","salary"]
sampleDF = spark.createDataFrame(data=records, schema = record_Columns)
sampleDF.show(truncate=False)
Use drop() function in pyspark –
Let’s say if we want to drop “specialization_id” from the above dataframe. You may use the below code part.
from pyspark.sql.functions import col
sampleDF=sampleDF.drop(col("specialization_id"))
sampleDF.show(truncate=False)
In this above section, we have seen how easy is to drop any column in dataframe.
Dropping multiple columns-
Hey! it’s so simple, In the place of a single column, we can pass multiple entries. Here is an example of the same. Let’s understand with the above example. Suppose if we want to drop the “salary” column along with the “specialization_id” column. Check out the code with output-
from pyspark.sql.functions import col
sampleDF=sampleDF.drop("specialization_id","salary")
sampleDF.show(truncate=False)
Hope you have found this article informative on this column-dropping topic. We always try to keep the content simple. We also try to add the output section for more clarity on the topic. Still, if you have any doubt related to the same, please feel free to contact us. You may comment below or write back to us. Keep reading our articles !! and please subscribe for more updates on pyspark and data engineering.
Thanks
Data Science Learner Team
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.