to_timestamp pyspark function : String to Timestamp Conversion

to_timestamp pyspark function

to_timestamp pyspark function is the part of “pyspark.sql.functions” package. This to_timestamp() function convert string to timestamp object. In this article, we will try to understand the complete implementation through a dummy dataframe with minimal rows and data. We will step by step, firstly create the same and then perform the to_timestamp() function over its required column.

 

to_timestamp pyspark function : ( Implementation ) –

As I said the first step is to create a dummy Pyspark dataframe as prerequisite to this implementation explanation.

Step 1: (Prerequisite)-

We will also include import statements with this dummy data creation. Here is the code, Lets run the same-

from pyspark.sql.functions import *
df=spark.createDataFrame(
        data = [ ("100","2021-12-01 10:01:19"),
                ("101","2021-11-02 11:01:19"),
                ("102","2021-10-24 12:08:19")],
        schema=["Seq","string_timestamp"])
df.printSchema()
  1. As I explained this to_timestamp() function is submodule of  “pyspark.sql.functions” hence we need to import this first. Like in the above lines we have imported (*) which will import all internal modules out of this package. But let me tell you that (“*”) import is not best practices. I will recommend to import those function which we are calling in the code.
  2. We are only mocking three rows with two columns named [“Seq”,”string_timestamp”]. Here the string_timestamp is the column which we will use to convert into timestamp format. Lets go to our second and final step.

 

Step 2 : Converting String column to Timestamp format in Pyspark –

In this step, we will create new column in the above pyspark dataframe by withColumn function.

df_modified=df.withColumn("converted_timestamp",to_timestamp("string_timestamp"))

The above code will generate a new column with the name of “converted_timestamp” where timestamp would be the data format. While when we mock the pyspark dataframe, It was in string data format. Lets put all the code together and run.

Complete Code –

from pyspark.sql.functions import *
df=spark.createDataFrame(
        data = [ ("100","2021-12-01 10:01:19"),
                ("101","2021-11-02 11:01:19"),
                ("102","2021-10-24 12:08:19")],
        schema=["Seq","string_timestamp"])
df.printSchema()

df_modified=df.withColumn("converted_timestamp",to_timestamp("string_timestamp"))
df_modified.show(truncate=False)
df_modified.printSchema()

Here is the output-

to_timestamp() pyspark
to_timestamp() pyspark

Here we can see that the data type for “converted_timestamp” column ( derived one)  and”string_timestamp” column ( initial / original one ).  This is what we basically want to achieve.

Thanks

Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Abhishek ( Chief Editor) , a data scientist with major expertise in NLP and Text Analytics. He has worked on various projects involving text data and have been able to achieve great results. He is currently manages Datasciencelearner.com, where he and his team share knowledge and help others learn more about data science.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner