Spark SQL Engine

Get an introduction to the Spark SQL engine and its two sub-components, Tungsten Project and Catalyst optimizer.

We'll cover the following

Overview

Spark SQL allows developers to programmatically issue ANSI SQL:2003–compatible queries on structured data with a schema. Spark SQL was introduced in version 1.3. Since then, several higher-level functionalities have been built upon it. Some of these are:

  • Generates optimized query plans and the final execution of compact JVM code.

  • Serves as a bridge to external tools using database ODBC/JDBC connectors.

  • Adds the ability to read and write structured files in various formats like JSON, CSV, or Avro and convert them into temporary tables.

  • Connects to the Apache Hive metastore and tables.

  • Introduces an interactive Spark SQL shell for adhoc and quick data exploration.

  • Unifies the various components of Spark and allows for creating DataFrame/Dataset abstractions in languages supported by Spark (Java, Scala, Python and R).

Get hands-on with 1200+ tech skills courses.