Course Summary
Let's now summarize all the concepts we learned so far.
We'll cover the following...
Introduction to regular expressions
- A regular expression, also known as regex or regexp, is a pattern of characters we want to match or search for in a text.
- Regular expressions are widely used in Unix operating systems, text editors, programming languages, and various domains like bioinformatics and lexical analysis.
- Regular expressions scan the string from left to right to look for matches with a given pattern.
- The regular expression pattern lets us match an input text against it. It performs operations based on the results, like parsing useful information, finding and replacing texts, splitting a string, extracting data, and so on.
The String class and its methods
-
A string is a sequence of Unicode characters enclosed by double quotes.
-
A string literal prefixed with @ in C# denotes a verbatim string literal, in which escape sequences and interpolation are not processed. For example, the string
"C:\\temp"can also be represented as@"C:\temp". -
The
stringkeyword in C# is an alias for theSystem.Stringclass in the .NET Framework. It provides various methods and properties to work with strings. -
Strings are an immutable sequence of
System.Charobjects. -
The
Concat()method of theStringclass concatenates the two strings. -
The
Match()method of theStringclass matches a regular expression against a string. -
The
Replace()method of theStringclass replaces a given substring with another substring in a string. -
The
Split()method of theStringclass splits a string into multiple substrings based on the characters in an array. -
The
Substring()method of theStringclass extracts a part of a string. -
The
Contains()method of aStringclass checks whether a string contains a given substring. -
The
StartsWith()andEndsWith()methods of theStringclass check whether a string starts with or ends with a given substring. -
The
IndexOf()method of theStringclass to find the index of a given character or substring in a string.
Regular expressions APIs in C#
-
The
System.Text.RegularExpressionsnamespace of the .NET Framework provides a set of classes and methods to create, match, and manipulate regular expressions. -
The
Regexclass is the primary type of the Regular Expressions API. It provides a set of static methods and properties to work with regular expressions. -
The
Matchclass represents the results of a single regular expression match. It contains information about each match, such as the value of the matched string, and its start and length within the input string. -
The
MatchCollectionclass contains a collection ofMatchobjects. -
The
Groupclass represents a matching subexpression within a regular expression match. -
The
GroupCollectionclass contains a collection ofGroupobjects that represent all the captured groups within a single regular expression match. -
The
Match()method of theRegexclass matches a regular expression against a string. This method returns aMatchobject that contains information about the match. -
The
Matches()method of theRegexclass finds all the matches of a regular expression in a string. This method returns a collection ofMatchobjects that contain information about all the matches. -
The
Replace()method of theRegexclass replaces a regular expression with another string. -
The
Split()method of theRegexclass splits a string into an array of substrings. This method splits the input string at the positions that match the regular expression. -
The
IsMatch()method of theRegexclass checks whether a regular expression matches a given string. This method returnsTrueif the regular expression matches the given string. Otherwise, it returnsFalse.
Special characters in regular expressions
-
The dot (
.) character matches any single character, except for the newline characters. -
The caret (
^) character matches the start of the input string. -
The dollar sign (
$) character matches the end of the input string. -
A pair of square brackets
[]represents a character class.
Character classes
-
A
characterclass matches the single character enclosed within the square brackets. -
A
characterclass also includes a range of characters, represented by two characters separated by a hyphen-. For example,A-Zmatches any uppercase letter fromAtoZ.
Meta characters
-
The backslash (
\) symbol is a meta character that represents various predefined character classes. -
\smatches any whitespace (space, tab, carriage-return,newline, and form-feed). -
\dmatches digits (0 to 9).
Quantifiers
-
The symbols
?,*,+are used as quantifiers. -
X?matches zero or one occurrence ofX. -
X*matches zero or more occurrences ofX. -
X+matches one or more occurrences ofX.
RegexOptions
-
The
RegexOptionsenumeration controls how regular expression operations are performed. -
We can include one or more values from the
RegexOptionsenumeration in a bitwise combination by using the OR (|) operator. For example, if we want to perform case-insensitive and culture-insensitive matches, we use the valueRegexOptions.IgnoreCase | RegexOptions.CultureInvariant. -
We can pass the value of the
RegexOptionsenumeration as an argument to methods that expect options. For example, we can specify the options for constructing aRegexobject.
– RegexOptions.CultureInvariant specifies that cultural differences in language are ignored.
-
RegexOptions.IgnoreCasespecifies that the regular expression is case-insensitive. -
RegexOptions.Multilinespecifies that the regular expression matches multiple lines of input. -
RegexOptions.Singlelinespecifies that the “.” character matches all characters, including newline characters. -
RegexOptions.IgnorePatternWhitespacespecifies that white space in the regular expression pattern is ignored.
Working with capture groups
-
Groups specified by parenthesis
(), subdivide the match found by regular expressions. -
We can access groups using the
Groupsproperty of theMatchobject. -
The
Valueproperty of theGroupobject contains the value of the group that is matched. -
The
Successproperty indicates whether the group matches the input string. -
The
Indexproperty of the Group object contains the index of the matched group. -
The
Lengthproperty of the Group object contains the length of the matched group. -
The captured groups are numbered, starting from 1.
-
$ndenotes the nth captured group, where n is the number of the captured group.
Working with backreferences
-
Backreferences let us reuse previously matched sub-strings within a regular expression pattern.
-
\\nddenotes the nth captured group, where n is the number of the captured group. -
$&denotes the entire match. -
$\denotes the part of the string before the match. -
${name}denotes the value of the named captured groupname.
Advanced topics
-
Regex patterns are often used to search for sensitive data, such as credit card numbers or social security numbers. We must make sure to not accidentally store or log this sensitive data.
-
A malicious user might try to submit a string causes our regular expression to take a long time to process. This is called a regex denial of service (DoS) attack.
-
We can prevent DoS attacks by using the
Timeoutsproperty to specify how long a regular expression operation can take before it times out. -
We should always use the simplest regular expressions that match the patterns we look for to ensure good performance.
Congratulations
Congratulations on finishing this course! The lessons you’ve learned here will be invaluable as you continue to make more complex and practical C# applications.
Thanks for enrolling in this course, and good luck with your next steps as a programmer!