How to Ingest Files: Part II

Learn how to load different file formats with the Spark API.

Ingestion of XML files

Extensible Markup Files, or XML files, are still broadly present in the realm of data formats.

These files are structured, extensible, self-describing (easy to read for us humans), and can be validated by using XSD files in conjunction with them.

Note: For more information on the XML format, please refer to: https://www.w3.org/XML/

On the downside, they tend to be quite verbose, and sometimes, depending on the complexity of their structure, very hard to read. Nonetheless, this format is widely used, and Spark finds no impediments to parsing it for us.

The project for this lesson is quite similar to the previous lesson one:

Get hands-on with 1200+ tech skills courses.