Modeling Gherkin
Learn about how to write a Gherkin parser using Gherkin keywords.
We'll cover the following...
Applied techniques: Writing a Gherkin parser
Gherkin is an indentation-based language that allows developers to write software tests in a way that reads like a natural language, such as English or French. We will not be looking to explain how to use Gherkin to run tests but rather explore the structure of the language and write a parser in PHP that will handle it. While we do not want to get into deep discussions on how tests written in Gherkin are eventually used, we need to look at quite a few language examples to get a sense of what we are dealing with before we start writing our parser. Let’s take a look at the following code example from the Gherkin reference:
Feature: Guess the word# The first example has two stepsScenario: Maker starts a gameWhen the Maker starts a gameThen the Maker waits for a Breaker to join# The second example has three stepsScenario: Breaker joins a gameGiven the Maker has started a game with the word "silky"When the Breaker joins the Maker's gameThen the Breaker must guess a word with 5 characters
We can already see a few notable things in the code snippet above before we get into our parser implementation. The first thing to note is that a Gherkin file always begins with a Feature
block and can contain multiple children. We also have two other block types called scenarios.
Gherkin uses a scenario to express a testable behavior within the feature test. Every Gherkin scenario belongs to a feature, and a scenario can have many steps.
Gherkin’s parent-child relationships are indicated by the indentation of the file. In the code snippet above, we have the following program structure:
- Feature
- Scenario
- Step
- Step
- Scenario
- Step
- Step
- Step
- Scenario
The empty lines on lines 2 and 7 can largely be ignored for our purposes. Lines 3 and 8 contain an interesting construct we need to consider: the Gherkin comment.
The Gherkin comment
Gherkin comments can appear on any line, with any number of leading whitespace, but will always start with the #
character. These few facts will make it relatively painless for us to parse these later.
We can update our mental program structure to:
- Feature
- Comment
- Scenario
- Step
- Step
- Comment
- Scenario
- Step
- Step
- Step
The next question is whether these structures are identified exclusively by their indentation or whether Gherkin provides a different way to distinguish a scenario from a step. Continuing to peruse the documentation, we find that every non-blank line must begin with a Gherkin keyword. They call out that the exception to this rule is free-form descriptions, but we will get into that later.
Gherkin keywords
The list of keywords that we need to be aware of is as follows:
Gherkin Keywords
Keyword | Description |
| Provides a high-level description of a testable software feature |
| Provides an organizational abstraction around a single business rule to be tested |
| Provides a way to define steps that apply to all scenarios within a |
| Alias of |
| Used to define a |
| Alias of |
| An example application of a business rule; defines some series of steps that can be used to test a behaviour |
| A container for a data table that appears below a |
| Alias of |
| Identifies the initial state of a software system before a test is executed |
| Represents a description of an action or event taken place within the software system |
| Represents a description of an expected outcome after actions are taken within a software system. |
| Alias of |
| Alias of |
| Repeats the last step keyword |
The Gherkin language constructs
In addition to the keywords, we also need to be mindful of the following additional language constructs:
Gherkin Language Constructs
Construct | Description |
Free form description | Optional descriptions that can appear below a |
Comment | A non-blank line that will not be parsed as any other Gherkin block type |
Doc strings | Used to pass large pieces of text to step definition; can be multiline, and can be defined in two different ways |
Data tables | Provides a syntax to define tables of data, similar to Markdown |
Parameters | Parameters can be embedded in step definitions to dynamically replace data from rows defined in a data table |
Tags | Provides a way to group related Gherkin scenarios |
The number of keywords and constructs can ...