How to add a current timestamp column to pyspark DataFrame
The current timestamp can be added as a new column to spark Dataframe using the current_timestamp() function of the sql module in pyspark.
The method returns the timestamp in the yyyy-mm-dd hh:mm:ss. nnn format.
Syntax
pyspark.sql.functions.current_timestamp()
Parameters
This method has no parameters.
Return value
This method returns the current timestamp.
Code example
Let’s see the code below:
import pysparkfrom pyspark.sql import SparkSessionfrom pyspark.sql.functions import current_timestampspark = SparkSession.builder.appName('edpresso').getOrCreate()data = [("James","Smith","USA","CA"),("Michael","Rose","USA","NY"),("Robert","Williams","USA","CA"),("Maria","Jones","USA","FL")]columns = ["firstname","lastname","country","state"]df = spark.createDataFrame(data = data, schema = columns)df_with_ts = df.withColumn("curr_timestamp", current_timestamp())df_with_ts.show(truncate=False)
Code explanation
- Line 4: A spark session with the app’s Educative Answers is created.
- Lines 6–10: We define data for the DataFrame.
- Line 12: We define the columns of the DataFrame.
- Line 13: We create a DataFrame using the
createDataframe()method. - Line 15: We add a new column to the data frame using the
withColumn()method passing the new column namecurr_timestampand the value to assign to the column the timestamp value returned by the methodcurrent_timestamp(). - Line 17: We print the DataFrame.
Free Resources
Copyright ©2025 Educative, Inc. All rights reserved