How to save a PySpark DataFrame to a CSV file
The df.write.csv() method is used to write a DataFrame to a CSV file. Various different options related to the write operation can be specified via the df.write.option() method.
Syntax
df.write.option("option_name", "option_value").csv(file_path)
Parameter
file_path: Denotes the path where the csv file to be created.
Example
import pyspark, os
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('answer').getOrCreate()
data = [("James","Educative","Engg","USA"),
("Michael","Google",None,"Asia"),
("Robert",None,"Marketing","Russia"),
("Maria","Netflix","Finance","Ukraine"),
(None, None, None, None)
]
columns = ["emp name","company","department","country"]
df = spark.createDataFrame(data = data, schema = columns)
csv_file_path = "data.csv"
df.write.option("header", True).option("delimiter",",").csv(csv_file_path)
Code
Follow the instructions mentioned below to inspect the generated CSV file.
- Use the
lscommand to view thedata.csvdirectory. - Use the
cd data.csvcommand to view the generated.csvfile. - Use the
lscommand to view the generated.csvfile. - To inspect the data contained in the generated file, use the
catcommand. - Use the
cat *.csvsyntax. The*sign denotes the filename with a.csvextension. We may copy and paste the filename here.
Explanation
- Lines 1–2: The
pysparkDataFrame andSparkSessionis imported. - Line 4: We create a
SparkSessionwith the application nameanswer. - Lines 6–11: We define the dummy data for the DataFrame.
- Line 13: We define the columns for the dummy data.
- Line 14: We create a spark DataFrame with the dummy data defined above.
- Line 16: The CSV file path where the CSV file to be generated is defined.
- Line 17: The DataFrame is written to a CSV file by invoking the
write.csv()function on the DataFrame object.
Free Resources
Copyright ©2026 Educative, Inc. All rights reserved