What is the CYK algorithm?
The CYK algorithm is a parsing algorithm for context-free grammar. It is used to check if a particular string can be derived from the language generated by a given grammar. It is also called the membership algorithm as it tells whether the given string is a member of the given grammar or not. It was independently developed by three Russian scientists named Cocke, Younger, and Kasami, hence the name CYK!
Basics
In a CYK algorithm, the structure of grammar should be in Chomsky normal form.
In addition, the CYK algorithm uses the dynamic programming or table filling algorithm.
The grammar will be in
- (at most two variables on the right-hand side)
- a (a single terminal on the right-hand side)
- (null string)
If the given Grammar is not in the CNF form, convert it to the CNF form before applying the CYK Algorithm.
The algorithm
In the CYK algorithm,
- Construct a triangular table of the length of your given string.
-
Each row corresponds to the length of the substrings of the given word ‘w’:
- The bottom row will have strings of length 1.
- The second row from the bottom will have strings of length 2 and so on.
-
is a set of variables A such that is a production of grammar G, where is part of the given string ‘w’ or the whole string.
-
Compare at most n pairs of previously computed sets. For strings with a length of 2, compare two pairs, and for strings with a length of 3 compare three sets. In this way, the next sets (A) can be computed, which are derived from the given grammar as per the following formula:
In the equation, refers to the set of variables derived from the production rules.
Rule: If the top row consists of the starting variable of the grammar, then the given string can be derived from it.
Example
Let’s look at an example to get a better understanding of the algorithm.
Consider the following CNF grammar :
The given word .
Step 1: Constructing the triangular table
Step 2: Populating the table
-
The first row is computed simply by looking at the grammar to see which production is producing the string of length 1 for the given word ‘w’.
-
The second row is computed by comparing the two strings that were computed previously. So we can have : , , , and .
-
For the third row, there are three possible substrings of length three in the given word, namely : , , and For the substring , there are two possibilities: and
Finding the sets to compute these:
We take the union of these as follows:
Then, we look in the grammar rules for populating the table.
- We’ll repeat this for all substrings with lengths greater than .
The illustration below shows how the table is populated.
This is the final Triangular table.
Step 3: Check the table
Finally, we need to see if the given word is in the language of grammar
Yes. As we can see in the table, the cell X(1, 5) = (S, A, C). S is present in the final box. So, we can say that 'baaba' Є
Time complexity
The running time of the algorithm is O(n3).
Free Resources