Introduction to XML and XPath

Delve into the realm of XML and XPath.

In this lesson, we'll look at an additional DOM structure known as XML, and introduce a distinct yet more efficient approach to traversing XML and HTML documents.

Introduction to XML

XML (Extensible Markup Language) is another markup language used to organize and structure data in a hierarchical format. It is a versatile tool for storing and transmitting information between different systems, platforms, and applications. In contrast to HTML, XML doesn't inherently support data interaction or display. Additionally, formatting is less significant in XML, as the interpreter comprehends the syntax regardless of the format.

Press + to interact
<?xml version="1.0" encoding="UTF-8"?>
<movies>
<movie category="Drama">
<title lang="en"> Birdman </title>
<year> 2014 </year>
<awards> 4 </awards>
<nominations> 9 </nominations>
</movie>
<movie category="Family">
<title lang="en"> Inside Out </title>
<year> 2015 </year>
<awards> 2 </awards>
<nominations> 1 </nominations>
</movie>
</movies>

XML vs. HTML

Here are some differences between XML and HTML:

XML

HTML

XML is primarily used for storing and transporting data.

HTML is primarily used for displaying the data and how it should look in a browser.

It doesn't have any predefined tags or syntax. Tags can be custom-mode to fit specific needs.

The tags are predefined and must follow the correct syntax in HTML.

The closing tags are mandatory in XML.

The closing tags are recommended but not mandatory in HTML.

Navigating XML DOM

Navigating XML documents follows a process similar to using CSS selectors and Beautiful Soup.

Press + to interact
XML DOM
XML DOM

The key difference lies in passing the appropriate XML parser to the Beautiful Soup object before initiating navigation.

Press + to interact
main.py
page.xml
<?xml version="1.0" encoding="UTF-8"?>
<movies>
<movie category="Drama">
<title lang="en"> Birdman </title>
<year> 2014 </year>
<awards> 4 </awards>
<nominations> 9 </nominations>
</movie>
<movie category="Drama">
<title lang="en"> The Imitation Game </title>
<year> 2014 </year>
<awards> 8 </awards>
<nominations> 1 </nominations>
</movie>
<movie category="Family">
<title lang="en"> Inside Out </title>
<year> 2015 </year>
<awards> 2 </awards>
<nominations> 1 </nominations>
</movie>
</movies>

Note: The HTML declaration <?xml version="1.0" encoding="UTF-8"?> in the above page.xml file is also called a prologue. It is optional but it must comes first in the document and it doesn't have a ...