Spark SQL Data Source
Explore how to use Spark SQL data sources to read and write data in formats such as CSV, JSON, and Parquet. Understand DataFrameReader and DataFrameWriter APIs, enabling you to load data into DataFrames, save DataFrames to tables or files, and query structured data using Spark SQL commands.
We'll cover the following...
We'll cover the following...
Reading data into DataFrames
Once data has been ingested, processed, and loaded into Spark SQL databases and tables, it can be read as DataFrames. An example is shown below:
scala> val movies = spark.read.format("csv")
.option("header", "true")
.option("samplingRatio", 0.001)
.option("inferSchema", "true")
.load("/data/BollywoodMovieDetail.csv")
scala> movies.write.saveAsTable("movieData")
scala> val movieTitles = spark.sql("SELECT title FROM movieData")
scala> movieTitles.show(3, false)
+---------------------------------+
|title |
+---------------------------------+
|Albela |
|Lagaan: Once Upon a Time in India|
|Meri Biwi Ka Jawab Nahin |
+---------------------------------+
only showing top 3 rows
In the above example, we create the Spark SQL table movieData ...