Introduction to Regular Expressions

This section will introduce you to regular expressions, its purpose, and practical use cases.

Introduction to Regular Expressions

A regular expression (also known as regex or regexp) is an object that describes the pattern of characters you are looking to match or search for in a text.

Below is an example of a regular expression:

^(regexp?| regular expression)$

With the above regular expression pattern, we can search a text and verify if it matches any of the following strings:

  • "regex",
  • "regexp"
  • "regular expression"

The figure below represents the above regular expression example graphically.

Explanation: The above regular expression matches the string as per the pattern below:

Start of line
start capturing Group#1
match the following
either, check for character sequence `regex`
and check for optional p character (occurring either zero or one time)
or, check for character sequence `regular expression`
End of line

If this isn’t making sense yet, don’t worry. We’ll explore many examples throughout the course to ensure you’ve mastered this concept by the end of the course.

What are Regular Expressions?

Regular expressions are a language that is used to represent a set of strings. The patterns allows you to match input text against these patterns and perform operations based on the results.

The term regular expression comes from the fact that the general pattern of these expressions is “regular”, meaning that it can be described by a finite state machine.

Regular expressions are also widely used in Unix tools, such as

  • sed and AWK,
  • text editors such as vim and emacs,
  • programming languages like Perl, Python, and Ruby
  • domains like bioinformatics, lexical analysis, etc.

The Java Development Kit (JDK) provides a package called java.util.regex. It is an API for matching patterns of text, which is also known as “regular expressions” or “Regex”.

Using this API, it is very possible to create both simple and complex Regex patterns. The JDK provides full support for regular expressions that includes java.lang.String methods which accept string patterns and performing following operations on them like:

  • creating/managing compiled patterns,
  • matching strings against patterns,
  • splitting strings according to patterns,
  • replacing parts of strings according to patterns.

How Regular Expressions Work

Regular expressions work by matching the provided pattern with the target string, character by character from left to right, and if they match then it returns a “match” and if they don’t match then it returns a “no match”.

Regular expressions are an essential part of many developer’s toolbox. They can be used in applications like .NET, JavaScript, Java, PHP and Python to perform various tasks ranging from validating data to searching patterns or extracting information. For example, you could use them to validate text input (e.g. user name or password), extract data from HTML or XML files or even find all instances of a pattern in large log files.

Need for regular expressions

Patterns in programming are very common. Whether you are writing a program to find and replace text, extract data from a web site, build a UI or create XML files - there will be some sort of pattern you need to match and use as part of your code.

Before regular expressions the only way that computers could find patterns in text was to search for word starting at the left most character of a string. This makes it hard to perform complex operations like ‘find all email addresses’ or ‘get all numbers from this site’.

Regular expressions change all of that - they allow you to write patterns that will find what you are looking for, and ignore everything else.

Regular expressions help us parse useful information, such as dates, phone numbers, or zip codes, from important text files such as code, log files, spreadsheets, or documents.

As we have seen in the example above, regular expressions can scan the string from left to right to look for matches to a given pattern.

Applications of regular expressions

A regular expression can be used in many different situations. Most commonly they are used to find matches for a pattern in text - either with String methods like “findAll” or by building your own engine that works with Java Regular Expressions.

The term pattern matching is often used synonymously with ‘regular expression’. Searching text for matches is just one use of regular expressions. You can also use them to build dynamic user interfaces, create XML files and perform complex operations using the Java API for Regular Expressions.

Regular expressions are primarily used to perform the following operations on textual data:

Use Cases Examples
String-based pattern matching - Extracting IP addresses from a server log file.
Searching and replacing - Searching for sensitive information like credit-card numbers and masking it as XXXX-XXXX-XXXX-XXXX in text document.
Performing string manipulation in a document - Converting all the strings representing date in a text document from DD-MM-YYYY format to MM-DD-YYYY format.
Data validation - Validation of input data values like phone numbers, zip codes, email addresses, etc.
Extracting information - Extracting hashtags from a document containing tweets or blog posts

In Java, we represent regular expressions with a unique pattern language. The format of these patterns is similar to the way regular expressions are represented in the Perl programming language. We will learn complete details about this pattern language later in this course.