How to add a prefix to all Spark DataFrame column names
There are multiple ways to add a prefix to all DataFrame column names in Pyspark. Here, we’ll discuss two method:
- The
withColumnRenamed()method - The
toDF()method
The withColumnRenamed() method
The withColumnRenamed() method is used to rename the column names of a DataFrame.
To learn more about this method, refer to how to rename multiple columns in pyspark?
Code example
Let’s look at the code below:
import pysparkfrom pyspark.sql import SparkSessionspark = SparkSession.builder.appName('edpresso').getOrCreate()data = [("James","Smith","USA","CA"),("Michael","Rose","USA","NY"),("Robert","Williams","USA","CA"),("Maria","Jones","USA","FL")]columns = ["firstname","lastname","country","state"]df = spark.createDataFrame(data = data, schema = columns)print("Original dataframe:")df.show(truncate=False)prefix = "educative-"for column in df.columns:df = df.withColumnRenamed(column, prefix + column)print("-" * 8)print("Renamed dataframe:")df.show(truncate=False)
Code explanation
- Line 4: We create a Spark session with the app’s Educative Answers.
- Lines 6–10: We define data for the DataFrame.
- Line 12: The columns of the DataFrame are defined.
- Line 13: A DataFrame is created using the createDataframe() method.
- Line 15: The original DataFrame is printed.
- Line 17: The prefix to be added is defined.
- Lines 18-19: The list of the DataFrame columns is obtained using
df.columns. Every column in the column list is prefixed with theprefixusing thewithColumnRenamed()method. - Line 23: The new DataFrame with new column names is printed.
The toDF() method
The toDF() method is used to return a new DataFrame with new column names.
Syntax
DataFrame.toDF(*cols)
Parameter
cols: There are the new column names.
Return value
This method returns a new DataFrame.
Code example
Let’s look at the code below:
import pysparkfrom pyspark.sql import SparkSessionspark = SparkSession.builder.appName('edpresso').getOrCreate()data = [("James","Smith","USA","CA"),("Michael","Rose","USA","NY"),("Robert","Williams","USA","CA"),("Maria","Jones","USA","FL")]columns = ["firstname","lastname","country","state"]df = spark.createDataFrame(data = data, schema = columns)print("Original dataframe:")df.show(truncate=False)prefix = "educative-"new_cols = [prefix + column for column in df.columns]new_df = df.toDF(*new_cols)print("-" * 8)print("Renamed dataframe:")new_df.show(truncate=False)
Code explanation
- Line 4: We create a spark session with the app’s Educative Answers.
- Lines 6–10: We define data for the DataFrame.
- Line 12: The columns of the DataFrame are defined.
- Line 13: A DataFrame is created using the createDataframe() method.
- Line 15: The original DataFrame is printed.
- Line 17: The prefix to be added is defined.
- Lines 18: A new list of column names prefixed with the
prefixis created. - Line 20: A new DataFrame where every column is prefixed is obtained using the
toDF()method, passing the new list of column names. - Line 23: The new DataFrame with new column names is printed.
Free Resources
Copyright ©2026 Educative, Inc. All rights reserved