What is the algorithm for Chomsky normal form?

Chomsky normal form (CNF) is a structured format for context-free grammar (CFGs). It enhances simplicity, making parsing and theoretical analysis more efficient. CNF production rules adhere to two specific patterns:

  • One, where a non-terminal generates two other non-terminals

  • Other, where a non-terminal generates a terminal symbol

This uniformity greatly simplifies parsing algorithms and eases the development of compilers by providing a consistent grammar structure conducive to formal language manipulation. Furthermore, CNF aids in resolving ambiguity within grammar and enables straightforward comparisons between different grammar systems, serving as an invaluable tool in formal language theory and compiler construction.

Algorithm

Suppose we have the given CFG as below.

The steps to convert a given CFG to Chomsky normal form are as follows:

Step 1: No start variable on the right side

In this step, we’ll ensure that the start symbol (usually denoted as SS) is only used as the left-most symbol in production rules. This ensures a clear starting point for generating valid sentences, preventing structural ambiguities. The given CFG has a start symbol on the right in the second production rule. So we created a new start symbol, SS' as follows:

Step 2: Remove all null productions

Null productions are grammar rules where a non-terminal can generate an empty string (ε)(ε). Eliminating these rules simplifies the grammar and eliminates potential confusion. We must find and replace rules that produce εε with alternative productions to remove null productions. In our example, we have to remove null productions in rule 4, i.e., QQ λ λ, so we can add λλ to the productionPP QQ|SS, making it PP QQ|SS|λλ as follows:

Now, we’ll remove the null production PP λλ as told above.

Step 3: Remove all unit productions

Unit productions are grammar rules in which one non-terminal PP generates another non-terminal QQ directly. The recommended approach is to replace unit productions by directly incorporating non-terminal QQ productions where non-terminal PP is used. In our example, we’ll remove unit production SSS→ S.

Next, we’ll remove SSS' →S.

Next, we’ll remove PQP→ Q.

We’ll also remove PSP→ S .

Step 4: Replace productions PQ1...QnP→ Q1...Qn where n>2n>2

This step removes productions where a single non-terminal PP generates more than two symbols (Q1...Qn)(Q1...Qn) on the right side. This is accomplished by introducing new non-terminals and dividing the production into simpler components. For instance, we haveSPSPS ' → PSP ; we can replace it with SPPS' → PP' , where PSPP'→ SP.

Step 5: Replace productions PpQP → pQ where pp is terminal, and PP and QQ are non-terminals

This step addresses productions where a terminal symbol pp is the initial symbol on the right side of a production. To handle this, we substitute such productions with a combination of new non-terminals and terminals as required. For instance, we have SpQS' → pQ; we can replace it with SQQS' → Q'Q, where QxQ'→ x.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved