Parsing Expressions

Understand how the parsing process works.

The syntax of OCaml expressions consists of a set of rules that dictate what input string makes up a valid OCaml expression. Some of the rules might look something like the following in the BNF notation.

<expr> ::= 
  | <number>
  | <unop> <expr>
  | <expr> binop expr
  | if <expr> then <expr> else <expr>
  | ... 

<number> ::= 0 | 1 | 2 | ...
<unop> ::= not | ...
<binop> ::= + | - | * | ...

Since the OCaml compiler implements all these rules, it can accept an input string as syntactically valid if it satisfies the syntactic rules. For instance, 1 + 2 follows the rule, <expr> binop <expr>, which is accepted by the OCaml compiler.

However, the OCaml compiler might reject input strings that do not follow its syntactic rules. For instance, the OCaml compiler will reject 1xyz + 2 with the following error message:

Error: Line 1, characters 0-4:
Error: Invalid literal 1xyz

The error message indicates that the OCaml compiler cannot recognize 1xyz as a valid literal. Similarly, the OCaml compiler also reject 1 + with an error message.

Error: Syntax error

This error message is different from the one above. While the former ones say Invalid literal, the latter says syntax error.

In this section, we’ll take a closer look at how the OCaml compiler does all these things through a process called parsing. We’ll understand exactly why we got two different error messages in the examples above. Again, we’ll draw analogies from natural languages like English.

How we recognize English sentences and phrases

Our ability to process natural languages is impressive. Looking at the letter combination “cute baby cat,” we immediately know it’s a correct English noun phrase. Equally intriguingly, we can tell quickly that “baby qbr cat” or “baby cute cat” is not a valid English phrase. How do we do that?

In its simplest form, we recognize the syntax of an English phrase/sentence in two stages:

  • Recognizing words
  • Recognizing phrases/sentences

The following diagram visualizes this process.

Get hands-on with 1200+ tech skills courses.