Transformations (II): FlatMap and Distinct
Explore how to apply the FlatMap transformation to flatten arrays within Spark Java DataFrames and map elements according to custom logic. Understand the use of Distinct to eliminate duplicate rows, enabling clean and structured data for batch applications.
We'll cover the following...
FlatMap
The FlatMap operation is an old resident of the functional programming paradigm realm. It can be tricky to understand conceptually. There are two key components to learn about regarding FlatMap’s purposes:
-
Being a map transformation in nature, it applies a function to each element of a collection. This is no different than the plain
map()function described before. -
If the input is a collection of collections of elements (say a List of Lists, an array of arrays), it flattens the results into a single collection.
So fundamentally, objects are transformed in map and flatMap operations ...