Search⌘ K
AI Features

Spark SQL Data Source

Explore how to use Spark SQL data sources to read and write data in formats such as CSV, JSON, and Parquet. Understand DataFrameReader and DataFrameWriter APIs, enabling you to load data into DataFrames, save DataFrames to tables or files, and query structured data using Spark SQL commands.

Reading data into DataFrames

Once data has been ingested, processed, and loaded into Spark SQL databases and tables, it can be read as DataFrames. An example is shown below:

scala> val movies = spark.read.format("csv")
                              .option("header", "true")
                              .option("samplingRatio", 0.001)
                              .option("inferSchema", "true")
                              .load("/data/BollywoodMovieDetail.csv")

scala> movies.write.saveAsTable("movieData")

scala> val movieTitles = spark.sql("SELECT title FROM movieData")

scala> movieTitles.show(3, false)
+---------------------------------+
|title                            |
+---------------------------------+
|Albela                           |
|Lagaan: Once Upon a Time in India|
|Meri Biwi Ka Jawab Nahin         |
+---------------------------------+
only showing top 3 rows

In the above example, we create the Spark SQL table movieData ...