Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

python
pyspark
sql

How is a query written using the PySpark.SQL module?

Muhammad Muzammil

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Answers Code

We can utilize PySpark sessions called SparkSession to generate DataFrames. These DataFrames can be registered as tables, after which data can be fetched and added by SQL queries using the PySpark.SQL module.

Syntax

filtered = session.sql(query)

The syntax above can be used to perform SQL queries on tables using PySpark. The variables mentioned in the code indicate the following:

  1. session : This is the instance of the SparkSession which contains the function sql() .
  2. query : It is a string-based SQL query. It contains the name of the table(s) to which we apply the SQL query.
  3. filtered : This is the resulting table after applying the SQL query on a given table.

Code example

Let's look at the code below:

import pyspark
from pyspark.sql import SparkSession
print("SparkSession version: ", SparkSession.version)
session = \
SparkSession.builder.master("local[1]") \
  .appName('Pyspark') \
  .getOrCreate()
print(session)

# Creating a DataFrame
df = session.createDataFrame([("Apples", 10), ("Mangos", 20), ("Lemons", 3)])
df.show()
# Spark SQL Query
df.createOrReplaceTempView("table1")
filtered = session.sql("SELECT _1 FROM table1")
filtered.show() # SQL command applied to df

Code explanation

  • Lines 4-8: We create a SparkSession.
  • Line 11: We create a data frame containing two columns and three rows.
  • Line 14: We create a temporary view of the created data frame. This view will act like a table where we can apply SQL queries.
  • Line 15: We apply an SQL query on the temporary view.

RELATED TAGS

python
pyspark
sql

CONTRIBUTOR

Muhammad Muzammil
Copyright ©2022 Educative, Inc. All rights reserved

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Answers Code
Keep Exploring