Basics of HTML & Document Tree

In this lesson, we will go through the basics of HTML and DOM, which are needed for understanding XPath expressions.

What is HTML?

HTML stands for HyperText Markup Language. It’s a markup language used to create and structure the content of a web page, such as text, links, headings, paragraphs, etc.

The HTML files have .html or .htm extensions; you can view them using any web browser which reads the HTML files and renders its contents.

HTML is an XML document with predefined tags.

HTML components

Let’s understand the elements, tags, and attributes components using the HTML code example below.

<html>
<head>
<meta charset="UTF-8">
<title>educative-xpath-demo</title>
</head>
<body>
<div id="xpath-content">
<h1> HTML Example </h1>
<p>This HTML file is created to show the document tree example</p>
</div>
<div id="items">
<ul style="list-style-type:circle">
<li>Car</li>
<li>Bus</li>
<li>Truck</li>
</ul>
</div>
</body>
</html>

Elements

These are the building blocks of an HTML page, and are also referred to as nodes. A few of the elements/nodes shown in the above code are:

  • html, body, head
  • title, div, p etc.

We can have nested elements, like:

 <div id="items">
    <ul style="list-style-type:circle">
       <li>Car</li>
       <li>Bus</li>
       <li>Truck</li>
    </ul>
 </div>

Here, li is inside ul, which is nested inside the div element.

Tags

The tag defines HTML elements, and it is represented within angular brackets < >. As you can see in the example code above, there are 3 block-level element tags, <html> <head> and <body>, and then there are other element tags, like <title>, <div>, <p>, <ul>, etc.

In general, an opening tag is followed by a closing tag, like <head> ..... </head>.

Attributes

An attribute defines the properties of an HTML element. Here are some of the attribute examples from the above code:

  • <div id="xpath-content"> : In this case, id is an attribute for the div.

  • <ul style="list-style-type:circle"> : In this case, ul has an attribute style which represents the items list using ‘circle’ bullet points.

About HTML Document Tree

Each HTML document can be represented in the tree format, where the elements can be described in a family-like hierarchy having ancestors, descendants, parents, children, and siblings.

As you can see, we have marked the elements/nodes in the above tree based on their positioning within the hierarchy:

  • Ancestors: An ancestor is a node that is connected further up the Document Tree w.r.t. to the context node at any higher levels. Example - div is an ancestor of ul and li.

  • Descendants: An descendant is a node that is connected lower up the document tree w.r.t. to the context node at any lower levels. Example - div is a descendant of body node, li is a descendant of div node.

  • Siblings: Nodes at the same level that share the same parent node. Example - li nodes are siblings. Both the div nodes are siblings.

  • Parent and child: are self-explanatory here.

More about HTML Document Tree and it’s usage in writing XPath expression in the XPath Axes lesson.