Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

pysprak
python

How to rename multiple columns in Pyspark

Abhilash

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

The withColumnRenamed() method is used to rename an existing column. The method returns a new DataFrame with the newly named column. Multiple columns in a DataFrame can be renamed by chaining the withColumnRenamed() method for each column.

Syntax

DataFrame.withColumnRenamed(existing, new)

Parameters

  • existing: This is the name of the existing column.
  • new: This is the new name to be given to the existing column.

Return value

A new DataFrame is generated with the renamed columns.

Code example

Let’s look at the code below:

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('edpresso').getOrCreate()

data = [("James","Smith","USA","CA"),
    ("Michael","Rose","USA","NY"),
    ("Robert","Williams","USA","CA"),
    ("Maria","Jones","USA","FL")
  ]

columns = ["firstname","lastname","country","state"]
df = spark.createDataFrame(data = data, schema = columns)
print("Original dataframe:")
df.show(truncate=False)

new_df = df.withColumnRenamed("firstname", "First-Name") \
          .withColumnRenamed("lastname", "Last-Name") \
          .withColumnRenamed("country", "Country")

print("Renamed dataframe:")
new_df.show(truncate=False)
Renaming multiple columns

Note: Please scroll down the output window to view the entire output.

Code explanation

  • Lines 1–2: We import the pyspark and SparkSession.
  • Line 4: A spark session named edpresso is created.
  • Lines 6–10: We define data for the DataFrame.
  • Line 12: The names of the DataFrame’s columns are defined.
  • Line 13: A DataFrame is created using the createDataframe() method.
  • Line 15: The original DataFrame is printed.
  • Lines 17-19: Multiple columns of DataFrame are renamed by chaining the withColumnRenamed() method.
  • Line 23: The new DataFrame with new column names is printed.

RELATED TAGS

pysprak
python

CONTRIBUTOR

Abhilash
Copyright ©2022 Educative, Inc. All rights reserved

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring