Building Front-End Web Applications with Plain JavaScript/

...

/

HTML, XML, and Unicode

XML documents

XML provides a syntax for expressing structured information in the form of an XML document and includes nested elements and their attributes. The specific elements and attributes used in an XML document can come from any vocabulary, such as public standards, private standards, or user-defined XML formats. XML is used for specifying document formats, such as XHTML5, the SVG format, or the DocBook format. It can also be used with data interchange file formats such as the Mathematical Markup Language (MathML) or the Universal Business Language (UBL), as well as message formats such as the web service message format SOAP.

The default encoding of an XML document is UTF-8, which uses only a single byte for ASCII characters, but three bytes for less common characters. Almost all Unicode characters are legal in a well-formed XML document. Illegal characters are the control characters that are assigned codes 0 through 31, with the exception of the carriage return, line feed, and tab. It’s therefore dangerous to copy text from another non-XML text to an XML document because often, the form feed character creates a problem.

XML namespaces

Generally, namespaces help avoid name conflicts. They allow the reuse of the same local name in different namespace contexts. Many computational languages have some form of namespace concept, for instance, Java and PHP. XML namespaces are identified with the help of a namespace URI, such as the SVG namespace URI associated with a namespace prefix, such as svg. This kind of namespace represents a collection of names both for elements and attributes. It also allows namespace-qualified names of the form prefix:name (like svg:circle) as a namespace-qualified name for SVG circle elements.

A default namespace is declared at the start tag of an element in the following way:

<html xmlns="http://www.w3.org/1999/xhtml">

This example shows the start tag of the HTML root element where the XHTML namespace is declared as the default namespace.

The following example shows an SVG namespace declaration for an svg element embedded in an HTML document:

Correct XML documents

XML defines two syntactic correctness criteria. An XML document must be well-formed, and if it’s based on a grammar or schema, then it must be valid with respect to that grammar. In other words, it must satisfy all rules of the grammar. An XML document is called well-formed if it satisfies the following syntactic conditions:

There must be exactly one root element.
Each element needs a start tag and an end tag. However, empty elements can be closed as <phone/> instead of <phone></phone>.
Tags shouldn’tt overlap. For instance, we cannot have:
```
<author><name>Lee Hong</author></name>
```
Attribute names must be unique within the scope of an element. For instance, the following code is not correct:
```
<attachment file="lecture2.html" file="lecture3.html"/>
```

An XML document is called valid against a particular grammar (such as a DTD or an XML Schema) if the following conditions are met:

If it is well-formed.
If it respects the grammar.

The history of HTML

Berners-Lee developed the first version of HTML in 1990. A few years later, in 1995, Berners-Lee and computer scientist Dan Connolly wrote the HTML2 standard, which outlined the common use of HTML elements at that time. In the following years, HTML has been used and gradually extended by a growing community of early WWW adopters. This evolution of HTML, which has led to a messy set of elements and attributes (called “tag soup”), has been mainly guided by browser vendors and their competition with each other.

The development of XHTML in 2000 was an attempt by the World-Wide Web Committee (W3C) to improve these issues. However, this neglected to advance HTML’s functionality towards a richer user interface, which was the focus of the Web Hypertext Application Technology (WHAT) working group led by Ian Hickson, who is often considered as the mastermind and main author of HTML5 and many of its accompanying JS APIs that have adapted HTML for mobile applications.

The evolution of HTML

W3C has developed the following important versions of HTML:

HTML4 was developed in 1997 as a Standard Generalized Markup Language.
XHTML was developed in the year 2000 as an XML-based cleanup of HTML4.
XHTML5 was developed in 2014. It was created in cooperation (and competition) with the WHAT working group and was supported by browser vendors.

HTML was originally designed as a structure description language, not as a presentation description language. But, HTML4 is made up of many purely presentational elements, like font. XHTML has been taking HTML back to its roots, dropping presentational elements and defining a simple and clear syntax. XHTML was designed to support the following goals:

Device independence
Accessibility
Usability

For our purposes, we’ll adopt the following symbolic equation:

HTML = HTML5 = XHTML5

When we say “HTML” or “HTML5”, we actually mean XHTML5 because the syntax of XML documents is much more clear and much less confusing than the HTML4-style syntax also allowed by HTML5.

Note: Since HTML5 isn’t case-sensitive, all XHTML tags can go by the same tag.

The following simple example shows the basic code template that can be used for any HTML document:

In line 1, the HTML5 document type is declared so that browsers are instructed to use the HTML5 document object model (DOM).
The HTML start tag in line 2 uses the default namespace declaration attribute xmlns. The XHTML namespace URI is declared as the default namespace so that browsers and other tools understand that all non-qualified element names like HTML, head, body, and others are from the XHTML namespace. Additionally, in the HTML start tag, we set the default language for the text content of all elements (in this case, to "en" for English) using both the xml:lang attribute and the HTML lang attribute. This attribute duplication is a small price to pay for having a hybrid document that can be processed both by HTML and by XML tools
Finally, in line 4, using an empty meta element with a charset attribute, we set the HTML document’s character encoding to UTF-8. This is also the default for XML documents.

Introduction

The Foundations of Web Apps

Javascript Fundamentals

Building a Minimal Web App with Plain JS in Seven Steps

Integrity Constraints and Data Validation

Constraint Validation in Plain JS

Enumerations

Enumeration Attributes in Plain JS

Reference Properties and Unidirectional Associations

Implementing Unidirectional Functional Associations

Implementing Unidirectional Non-Functional Associations

Bidirectional Associations

Implementing Bidirectional Associations

Subtyping and Inheritance

Subtyping with Plain JS

Conclusion

HTML, XML, and Unicode

What is HTML and XML?

XML documents

Unicode and UTF-8

XML namespaces

Correct XML documents

The history of HTML

The evolution of HTML