Introduction to User-defined Functions

Learn about user-defined functions in detail.

We'll cover the following

Overview

The majority of the use cases we encounter in our day-to-day analysis or data engineering work can be resolved with methods or functions provided by the SQL or DataFrame API in PySpark. If built-in methods are not enough, we can write our own function, which we can use for a custom transformation. Writing user-defined functions requires a deeper understanding of both data structure and how a pure Python data structure is represented in PySpark. The return type of user defined functions (UDF) must be static. Therefore, the return data structure must be provided by us in a form of PySpark type. Moreover, UDF are the most expensive (less optimized) operations, hence we use them only when necessary and have no other choice.

Create a free account to view this lesson.

By signing up, you agree to Educative's Terms of Service and Privacy Policy