Chomsky Normal Form

Learn about Chomsky normal form (CNF) and the process of removing lambda and unit productions.

We'll cover the following

Now we’ll turn our attention toward taking a deeper look at the closure and other properties of context-free languages, developing algorithms for making decisions about any context-free language, and finally, introducing a formal language that is not context-free, eventually leading to the third and final class of formal languages. We’ll begin with a procedure for converting any CFG into a special form, called Chomsky normal form (CNF), which will facilitate deriving certain results about context-free languages.

Preparation for Chomsky normal form

In preparation for Chomsky normal form, we need to make two modifications to any CFG under consideration:

1. Remove lambda from all rules without changing the language generated (except we may lose the empty string, which is okay).
2. Remove unit productions, such as $X\to Y$, while maintaining their effects.

There are practical reasons for removing lambda from a CFG. To process a string efficiently to see whether it is generated by a CFG, it is better if the intermediate sentential forms of the working derivation do not decrease in size. If lambda is a valid replacement for a variable, then the working derivation string can shrink, so it is possible that a working derivation can grow and shrink and grow and shrink repeatedly, which is inefficient. A grammar can maintain the effect of the empty string in its rules without explicitly using lambda so that the sentential forms don’t shrink. Such a grammar is called noncontracting. The only downside is that the modified grammar will no longer generate the empty string, but that is a special case not worth worrying about most of the time.

Removing lambda

To remove lambda while maintaining its contribution in forming nonempty strings, we first identify which variables can generate the empty string, whether directly or indirectly. Such variables are called nullable variables, and, once identified, we incorporate their nullable effects into the rules of the grammar, removing any explicit mention of lambda. The language recognized by the modified grammar will be $L-\{\lambda \}$, where $L$ represents the language recognized by the original grammar.

To illustrate this, we’ll remove lambda from the following grammar:

Get hands-on with 1200+ tech skills courses.