to_timestamp pyspark function is the part of “pyspark.sql.functions” package. This to_timestamp() function convert string to timestamp object. In this article, we will try to understand the complete implementation through a dummy dataframe with minimal rows and data. We will step by step, firstly create the same and then perform the to_timestamp() function over its required column.
As I said the first step is to create a dummy Pyspark dataframe as prerequisite to this implementation explanation.
We will also include import statements with this dummy data creation. Here is the code, Lets run the same-
from pyspark.sql.functions import *
df=spark.createDataFrame(
data = [ ("100","2021-12-01 10:01:19"),
("101","2021-11-02 11:01:19"),
("102","2021-10-24 12:08:19")],
schema=["Seq","string_timestamp"])
df.printSchema()
In this step, we will create new column in the above pyspark dataframe by withColumn function.
df_modified=df.withColumn("converted_timestamp",to_timestamp("string_timestamp"))
The above code will generate a new column with the name of “converted_timestamp” where timestamp would be the data format. While when we mock the pyspark dataframe, It was in string data format. Lets put all the code together and run.
from pyspark.sql.functions import *
df=spark.createDataFrame(
data = [ ("100","2021-12-01 10:01:19"),
("101","2021-11-02 11:01:19"),
("102","2021-10-24 12:08:19")],
schema=["Seq","string_timestamp"])
df.printSchema()
df_modified=df.withColumn("converted_timestamp",to_timestamp("string_timestamp"))
df_modified.show(truncate=False)
df_modified.printSchema()
Here is the output-
Here we can see that the data type for “converted_timestamp” column ( derived one) and”string_timestamp” column ( initial / original one ). This is what we basically want to achieve.
Thanks
Data Science Learner Team