Working with DataFrame's Schemas

Let’s imagine a scenario where the requirement is to ingest one CSV file with a specific format that doesn’t match our DataSource model. Changing an already existing data model can be too costly and impact other applications that feed off the database.

Fortunately, we already know how to modify the data when it resides on a Spark DataFrame and change its structure (such as adding one column in one of our previous lessons). The API also offers the possibility of removing columns and other exciting operations. Let’s see how we can achieve it.

Working on ingested data

The previous hypothetical requirement can be defined as the following


A client is sending data, to store in our DataSource (DB), in a CSV format that doesn’t fit into our normalized data model.

Our application should perform the necessary transformations to persist said information to the DB, with a matching structure.


The following widget contains the codebase for this lesson:

Get hands-on with 1200+ tech skills courses.