How to import a CSV file in pyspark
The spark.read.csv() method is used to read a single CSV or a directory of CSV files to a spark DataFrame. Various different options can be specified via the spark.read.option() method.
Syntax
spark.read.option("option_name", "option_value").csv(file_path)
Parameter
file_path: This is the CSV file to be read.
Return value
This method returns a spark DataFrame.
Code example
Let’s look at the code below:
main.py
data.csv
import pysparkfrom pyspark.sql import SparkSessionspark = SparkSession.builder.appName("answers").getOrCreate()path = "data.csv"df = spark.read.option("header",'True').option('delimiter', ',').csv(path)df.printSchema()
Code explanation
- Lines 1–2: We import
pysparkandSparkSession. - Line 4: We create
SparkSessionwith the application nameanswers. - Line 6: We define the path to the CSV file.
- Line 8: We convert the CSV file to a DataFrame using the
csv()method. Multiple options are chained together using theoption()method. - Line 9: We print the DataFrame schema.
Free Resources
Copyright ©2026 Educative, Inc. All rights reserved