Basics of HTML & Document Tree
In this lesson, we will go through the basics of HTML and DOM, which are needed for understanding XPath expressions.
We'll cover the following...
What is HTML?
HTML stands for HyperText Markup Language. It’s a markup language used to create and structure the content of a web page, such as text, links, headings, paragraphs, etc.
The HTML files have .html or .htm extensions; you can view them using any web browser which reads the HTML files and renders its contents.
HTML is an XML document with predefined tags.
HTML components
Let’s understand the elements, tags, and attributes components using the HTML code example below.
Elements
These are the building blocks of an HTML page, and are also referred to as nodes. A few of the elements/nodes shown in the above code are:
html,body,headtitle,div,petc.
We can have nested elements, like:
<div id="items">
<ul style="list-style-type:circle">
<li>Car</li>
<li>Bus</li>
<li>Truck</li>
</ul>
</div>
Here, li is inside ul, which is nested inside the div element.
Tags
The tag defines HTML elements, and it is represented within angular brackets < >. As you can see in the example code above, there are 3 block-level element tags, <html> <head> and <body>, and then there are other element tags, like <title>, <div>, <p>, <ul>, etc.
In general, an opening tag is followed by a closing tag, like <head> ..... </head>.
Attributes
An attribute defines the properties of an HTML element. Here are some of the attribute examples from the above code:
-
<div id="xpath-content">: In this case,idis an attribute for thediv. -
<ul style="list-style-type:circle">: In this case,ulhas an attributestylewhich represents the items list using ‘circle’ bullet points.
About HTML Document Tree
Each HTML document can be represented in the tree format, where the elements can be described in a family-like hierarchy having ancestors, descendants, parents, children, and siblings.
As you can see, we have marked the elements/nodes in the above tree based on their positioning within the hierarchy:
-
Ancestors: An ancestor is a node that is connected further up the Document Tree w.r.t. to the context node at any higher levels. Example -
divis an ancestor ofulandli. -
Descendants: An descendant is a node that is connected lower up the document tree w.r.t. to the context node at any lower levels. Example -
divis a descendant ofbodynode,liis a descendant ofdivnode. -
Siblings: Nodes at the same level that share the same parent node. Example -
linodes are siblings. Both thedivnodes are siblings. -
Parent and child: are self-explanatory here.
More about HTML Document Tree and it’s usage in writing XPath expression in the XPath Axes lesson.