How to rename multiple columns in Pyspark
The withColumnRenamed() method is used to rename an existing column. The method returns a new DataFrame with the newly named column. Multiple columns in a DataFrame can be renamed by chaining the withColumnRenamed() method for each column.
Syntax
DataFrame.withColumnRenamed(existing, new)
Parameters
existing: This is the name of the existing column.new: This is the new name to be given to the existing column.
Return value
A new DataFrame is generated with the renamed columns.
Code example
Let’s look at the code below:
import pysparkfrom pyspark.sql import SparkSessionspark = SparkSession.builder.appName('edpresso').getOrCreate()data = [("James","Smith","USA","CA"),("Michael","Rose","USA","NY"),("Robert","Williams","USA","CA"),("Maria","Jones","USA","FL")]columns = ["firstname","lastname","country","state"]df = spark.createDataFrame(data = data, schema = columns)print("Original dataframe:")df.show(truncate=False)new_df = df.withColumnRenamed("firstname", "First-Name") \.withColumnRenamed("lastname", "Last-Name") \.withColumnRenamed("country", "Country")print("Renamed dataframe:")new_df.show(truncate=False)
Note: Please scroll down the output window to view the entire output.
Code explanation
- Lines 1–2: We import the
pysparkandSparkSession. - Line 4: A spark session named
edpressois created. - Lines 6–10: We define data for the DataFrame.
- Line 12: The names of the DataFrame’s columns are defined.
- Line 13: A DataFrame is created using the
createDataframe()method. - Line 15: The original DataFrame is printed.
- Lines 17-19: Multiple columns of DataFrame are renamed by chaining the
withColumnRenamed()method. - Line 23: The new DataFrame with new column names is printed.
Free Resources
Copyright ©2026 Educative, Inc. All rights reserved