Pyspark lit function example is nothing but adding constant value as column either without condition or with the condition.
The first step is importing for any of the modules. It is a prerequisite and the second is to create a dummy pyspark dataframe and the third is to add a column on its top. So Here we go.
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col,lit
Here is the code for dummy data conversion to Pyspark dataframe.
spark = SparkSession.builder.appName('DataScienceLearner.com').getOrCreate()
data = [("1",50000),("2",60000),("3",40000)]
columns= ["ID","Revenue"]
df = spark.createDataFrame(data = data, schema = columns)
Here we will use a lit function which will create an extra column and return a new dataframe. If you run the below code it will add a new column with a constant value as 1.
df2 = df.select(col("ID"),col("Revenue"),lit("1").alias("New_Column"))
df2.show(truncate=False)Let’s put the code together and run. After it we will see the output.
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col,lit
## Data Creation
spark = SparkSession.builder.appName('DataScienceLearner.com').getOrCreate()
data = [("1",50000),("2",60000),("3",40000)]
columns= ["ID","Revenue"]
df = spark.createDataFrame(data = data, schema = columns)
## Adding new column with lit()
df2 = df.select(col("ID"),col("Revenue"),lit("1").alias("New_Column"))
df2.show(truncate=False)We can achieve the same by combining when and lit modules from pyspark.sql.functions. Here is an example of the same.
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col,lit
from pyspark.sql.functions import when
## Data Creation
spark = SparkSession.builder.appName('DataScienceLearner.com').getOrCreate()
data = [("1",50000),("2",60000),("3",40000)]
columns= ["ID","Revenue"]
df = spark.createDataFrame(data = data, schema = columns)
## Adding new column with lit(
df2 = df.select(col("ID"),col("Revenue"),lit("1").alias("New_Column"))
df3 = df2.withColumn("lit_value2", when(col("Revenue") >=40000 & col("Revenue") <= 50000,lit("100")).otherwise(lit("200")))
df3.show(truncate=False)Only I will emphasize over “when module” importing. Please Refer to the above example. Make sure
We have used withColumn() in the above implementation. If you need more information related to withColumn function please this detailed article on the same.
I hope you must find this article found informative on usages of lit() function in pyspark. If you still have any doubt, please comment below in the comment box. You may write back to us via email too.
Thanks
Data Science Learner Team