Pyspark lit function example : Must for You

Pyspark lit function example

Pyspark lit function example is nothing but adding constant value as column either without condition or with the condition.

Pyspark lit function example : ( Steps )

The first step is importing for any of the modules. It is a prerequisite and the second is to create a dummy pyspark dataframe and the third is to add a column on its top. So Here we go.

Step 1: import lit in pyspark –

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col,lit

 

Step 2: Pyspark Dataframe creation for demo –

Here is the code for dummy data conversion to Pyspark dataframe.

spark = SparkSession.builder.appName('DataScienceLearner.com').getOrCreate()
data = [("1",50000),("2",60000),("3",40000)]
columns= ["ID","Revenue"]
df = spark.createDataFrame(data = data, schema = columns)

 

Step 3:  Adding constant Column using lit function –

Here we will use a lit function which will create an extra column and return a new dataframe. If you run the below code it will add a new column with a constant value as 1.

df2 = df.select(col("ID"),col("Revenue"),lit("1").alias("New_Column"))
df2.show(truncate=False)

Let’s put the code together and run.  After it we will see the output.

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col,lit
## Data Creation
spark = SparkSession.builder.appName('DataScienceLearner.com').getOrCreate()
data = [("1",50000),("2",60000),("3",40000)]
columns= ["ID","Revenue"]
df = spark.createDataFrame(data = data, schema = columns)
## Adding new column with lit()
df2 = df.select(col("ID"),col("Revenue"),lit("1").alias("New_Column"))
df2.show(truncate=False)
Pyspark lit function example
Pyspark lit function example

Conditionally Adding new column using lit() –

We can achieve the same by combining when and lit modules from pyspark.sql.functions. Here is an example of the same.

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col,lit
from pyspark.sql.functions import when

## Data Creation
spark = SparkSession.builder.appName('DataScienceLearner.com').getOrCreate()
data = [("1",50000),("2",60000),("3",40000)]
columns= ["ID","Revenue"]
df = spark.createDataFrame(data = data, schema = columns)
## Adding new column with lit(
df2 = df.select(col("ID"),col("Revenue"),lit("1").alias("New_Column"))
df3 = df2.withColumn("lit_value2", when(col("Revenue") >=40000 & col("Revenue") <= 50000,lit("100")).otherwise(lit("200")))
df3.show(truncate=False)

Only I will emphasize over “when module” importing. Please Refer to the above example. Make sure

We have used withColumn() in the above implementation. If you need more information related to withColumn function please this detailed article on the same.

End Notes :

I hope you must find this article found informative on usages of lit() function in pyspark. If you still have any doubt, please comment below in the comment box. You may write back to us via email too.

Thanks 

Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Abhishek ( Chief Editor) , a data scientist with major expertise in NLP and Text Analytics. He has worked on various projects involving text data and have been able to achieve great results. He is currently manages Datasciencelearner.com, where he and his team share knowledge and help others learn more about data science.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner