Search⌘ K
AI Features

Parse Strings with Regular Expressions

Explore how to parse strings using C++20 regular expressions. Understand regex syntax and iteration with sregex_token_iterator to extract specific text patterns, such as hyperlinks from HTML, enabling efficient lexical analysis and pattern matching in your C++ programs.

We'll cover the following...

Regular Expressions (commonly abbreviated as regex) are commonly used for lexical analysisLexical analysis, also known as tokenization, is the process of breaking down a sequence of characters into meaningful units called tokens, which are used as input for parsing or further processing in programming languages or compilers. and pattern-matching on streams of text. They are common in Unix text-processing utilities, such as grep, awk, and sed, and are an integral part of the Perl language. There are a few common variations in the syntax. A POSIX standard was approved in 1992, while other common variations include Perl and ECMAScript (JavaScript) dialects. The C++ regex library defaults to the ECMAScript dialectECMAScript dialect refers to a specific variant or version of the ECMAScript programming language, which is the standardized specification for scripting languages such as JavaScript..

The regex library was first introduced to the STL with C++11. It can be very useful for finding patterns in text files.

How to do it

For this recipe, we will extract hyperlinks from an HTML file. A hyperlink is coded in HTML like this:

<a href="http://example.com/file.html">Text goes here</a>

We will use a regex object to extract both the link and the text, as two separate strings.

  • Our example file ...