Modeling Gherkin

Learn about how to write a Gherkin parser using Gherkin keywords.

We'll cover the following...

Applied techniques: Writing a Gherkin parser
The Gherkin comment
Gherkin keywords
The Gherkin language constructs

Gherkin is an indentation-based language that allows developers to write software tests in a way that reads like a natural language, such as English or French. We will not be looking to explain how to use Gherkin to run tests but rather explore the structure of the language and write a parser in PHP that will handle it. While we do not want to get into deep discussions on how tests written in Gherkin are eventually used, we need to look at quite a few language examples to get a sense of what we are dealing with before we start writing our parser. Let’s take a look at the following code example from the Gherkin reference:

We can already see a few notable things in the code snippet above before we get into our parser implementation. The first thing to note is that a Gherkin file always begins with a Feature block and can contain multiple children. We also have two other block types called scenarios.

Gherkin uses a scenario to express a testable behavior within the feature test. Every Gherkin scenario belongs to a feature, and a scenario can have many steps.

Gherkin’s parent-child relationships are indicated by the indentation of the file. In the code snippet above, we have the following program structure:

Feature
1. Scenario
  1. Step
  2. Step
2. Scenario
  1. Step
  2. Step
  3. Step

The empty lines on lines 2 and 7 can largely be ignored for our purposes. Lines 3 and 8 contain an interesting construct we need to consider: the Gherkin comment.

The Gherkin comment

Gherkin comments can appear on any line, with any number of leading whitespace, but will always start with the # character. These few facts will make it relatively painless for us to parse these later.

We can update our mental program structure to:

Feature
1. Comment
2. Scenario
  1. Step
  2. Step
3. Comment
4. Scenario
  1. Step
  2. Step
  3. Step

The next question is whether these structures are identified exclusively by their indentation or whether Gherkin provides a different way to distinguish a scenario from a step. Continuing to peruse the documentation, we find that every non-blank line must begin with a Gherkin keyword. They call out that the exception to this rule is free-form descriptions, but we will get into that later.

Gherkin keywords

The list of keywords that we need to be aware of is as follows:

Gherkin Keywords

Keyword	Description
`Feature:`	Provides a high-level description of a testable software feature
`Rule:`	Provides an organizational abstraction around a single business rule to be tested
`Background:`	Provides a way to define steps that apply to all scenarios within a `Feature` test suite
`Scenario:`	Alias of `Example`
`Scenario Outline:`	Used to define a `Scenario`/`Example` that can be repeated with dynamic values defined in an attached data table
`Scenario Template:`	Alias of `Scenario Outline`
`Example:`	An example application of a business rule; defines some series of steps that can be used to test a behaviour
`Examples:`	A container for a data table that appears below a `Scenario Outline`
`Scenarios:`	Alias of `Examples`
`Given`	Identifies the initial state of a software system before a test is executed
`When`	Represents a description of an action or event taken place within the software system
`Then`	Represents a description of an expected outcome after actions are taken within a software system.
`But`	Alias of `Then`
`And`	Alias of `And`
`*`	Repeats the last step keyword

Gherkin Language Constructs

Construct	Description
Free form description	Optional descriptions that can appear below a `Feature`, `Example`, `Scenario`, `Background`, `Scenario Outline`, or `Rule` block.
Comment	A non-blank line that will not be parsed as any other Gherkin block type
Doc strings	Used to pass large pieces of text to step definition; can be multiline, and can be defined in two different ways
Data tables	Provides a syntax to define tables of data, similar to Markdown
Parameters	Parameters can be embedded in step definitions to dynamically replace data from rows defined in a data table
Tags	Provides a way to group related Gherkin scenarios

1.Introduction

2.What Are Strings?

3.Fluent Strings

4.The Formatting Helper Methods

5.The Logical Helper Methods

6.The Construction Helper Methods

7.The Extraction Helper Methods

8.Padding Strings

9.String Translations and Extension

10.Lines and Words

11.Applied Techniques: Writing a Gherkin Parser

12.Markov Chains and Text Generation

13.Fixed Width Data Parsing

14.Splitting Strings

15.Applied Techniques: A Blade Directive Validator

16.Working with HTML

17.Regular Expressions

18.Conclusion

19.Appendix

Modeling Gherkin

Applied techniques: Writing a Gherkin parser

The Gherkin comment

Gherkin keywords

Gherkin Keywords

The Gherkin language constructs

Gherkin Language Constructs