...

Spark User Defined Functions

Learn how to create user defined functions in Spark and utilize in-built Spark functions.

We'll cover the following...

Spark built-in functions
UDFs in PySpark

We have previously seen and worked with Spark’s in-built function, but Spark also allows users to define their own functionality wrapped inside user defined functions (UDFs) that can be invoked in Spark SQL. The major benefit of UDFs is reusability. UDFs exist per session and don’t persist within the underlying metastore. Let’s consider a simple function that returns the last two digits of the releaseYear value e.g., if the function is passed-in 2021, it’ll return 21. The function definition and its use is presented below:

val movies = spark.read.format("csv")
                       .option("header", "true")
                       .option("samplingRatio", 0.001)
                       .option("inferSchema", "true")

...

Spark Overview

DataFrames

Datasets

Spark SQL

Summary

Spark User Defined Functions