Union, UnionByName, and DropDuplicates

Get introduced to Union, UnionByName, and DropDuplicates transformations in this lesson.

We'll cover the following


The union transformation allows us to combine two DataFrames, thus producing a new one containing the rows from both.

This operation has the following characteristics:

  • The schemas of both DataFrames have to be identical. This doesn’t detour much from the classical SQL UNION operation available in RDBMS.

  • Duplicate records are preserved and aggregated to the final results.

We are going to first present a graphical representation of this transformation, which illustrates an interesting property that makes union an attractive transformation in specific scenarios.

Get hands-on with 1300+ tech skills courses.