Regular expressions API

The java.util.regex package contains the API that we use to work with regular expressions in Java. It is the most commonly used library for working with regular expressions in Java. This API is also popularly known as Java Regex, and it is a powerful tool for pattern matching.

You can use the following import statement to use the regular expressions API in your program.

import java.util.regex.*;

This package contains the following useful classes and interfaces that facilitate the use of regular expressions in Java.

  • Pattern class: This defines a pattern for searching or manipulating strings. It can also define the constraints on strings, such as phone number, zip code, password, and email validations. This is then compiled into a Pattern object. This pattern can be used to create Matcher objects that allow us to check our string with different modifiers.

  • Matcher class: This class performs match operations on the character sequence.

  • MatchResult interface: This provides query methods to fetch the result of pattern matching operation against a regular expression.

  • PatternSyntaxException class: This is an unchecked exception that is thrown when an invalid or syntactically incorrect regular expression is provided for pattern matching.

Components of a regular expression

A regular expression consists of two parts.

  • a pattern string
  • a flag specifying how the pattern should be matched (optional)

Flags

Flags provide instructions to regular expression processors about how to look for the pattern.

For example, if CASE_INSENSITIVE is provided in the pattern, it will perform case insensitive matching.

Below is the list of flags:

Flags Description
CASE_INSENSITIVE Used for case insensitive matching. The case of letters is ignored when performing a search.
COMMENTS Permits whitespace and comments in pattern.
DOTALL To enable Dotall mode
MULTILINE To enable the multiline mode
UNICODE_CASE Unicode aware case folding
UNIX_LINES To enable Unix lines mode
CANON_EQ Canonical equivalence
LITERAL Literal parsing of the pattern. Special characters in the pattern will not have any special meaning and will be treated as ordinary characters when performing a search.
UNICODE_CHARACTER_CLASS Used to enable Unicode version of predefined character classes and POSIX character classes. Use together with the CASE_INSENSITIVE flag to also ignore the case of letters outside of the English alphabet.

Now let’s learn about the Java classes for regular expressions in detail.

The Pattern class

The object of the Pattern class is a compiled representation of regular expression(s).

Each object represents a template or a specific sequence of characters that we look up within a string or multiple strings. The pattern contains placeholders in certain spots (to cover variables, for instance) because we won’t supply all the details. These variables are known by unique names called pattern variables. If the pattern matches, the pattern variable is initialized with the value contained with the input string that matches the pattern.

Creating a Pattern object

To create a pattern, use the following code snippet.

static Pattern compile(<reg-exp>);

Here, <reg-exp> is the regular expression pattern. It compiles the given regex and returns the instance of the pattern.

Next, call the compile() method with regular expression as the first argument.

It will return a Pattern object.

import java.util.regex.Pattern;
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
String ipAddress = "10.52.255.1";
// simple string matching
String separator = "\\.";
Pattern pattern = Pattern.compile(separator);
System.out.println(Arrays.toString(
pattern.split(ipAddress)));
}
}

The Matcher class

The Matcher class represents an engine that interprets the pattern and performs the pattern matching operations against the input string. It implements the MatchResult interface.

Creating a Matcher object

To create a Matcher object, call the matcher() method of the Pattern object. This method returns a Matcher object that matches the given input with the pattern.

String text = "text to match";

// creates a matcher that matches the text with the pattern
Matcher matcher = pattern.matcher(text);

We will learn about the replacement methods of Matcher class later in the course.

The MatchResult interface

The MatchResult interface contains both index methods and study methods. The index methods query the result of a match against a regular expression. The study methods analyze the input string and return whether or not the match is a success.

Index methods

Index methods provide the index values that contain the location of the match in the input string.

  • public int start(): Returns the start index of the previous match.

  • public int start(int groupNumber): Returns the start index of the subsequence captured by the given group during the previous match operation.

  • public int end(): Returns the offset after the last character is matched.

  • public int end(int groupNumber): Returns the offset after the last character of the subsequence captured by the given group during the previous match operation.

Study methods

Study methods review the input string and return a boolean indicating whether or not the pattern is found.

  • boolean lookingAt(): Attempts to match the input sequence, starting at the beginning of the region, against the pattern.

  • boolean find(): Attempts to find the next subsequence of the input sequence that matches the pattern.

  • boolean find(int start): Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index.

  • boolean matches(): Attempts to match the entire string against the pattern and returns the result as boolean.

  • String group(): Returns the matched sequence.

Example of pattern matching

The program below demonstrates the simplest form of pattern matching.

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class Main {
public static void main(String[] args) {
final int flag = Pattern.CASE_INSENSITIVE;
// final int flag = Pattern.UNIX_LINES; // Unix lines mode
// final int flag = Pattern.LITERAL;
// final int flat = Pattern.COMMENTS; // Permists whitespace and comments
// final int flag = Pattern.DOTALL; // dotall - '.' matches all characters
// final int flag = Pattern.MULTILINE;
// final int flag = Pattern.CANON_EQ; // canonical equivalence
// final int flag = Pattern.UNICODE_CHARACTER_CLASS; // POSIX character
// final int flag = Pattern.UNICODE_CASE; // Unicode-aware case folding
// creating a pattern with case insensitive flag.
Pattern pattern = Pattern.compile("regex", flag);
// creating a matcher which matches the pattern against the given string.
Matcher matcher = pattern.matcher("Welcome to this course!!." +
"\n In this course we will learn about Java regex");
// returns true or false depending whether tha pattern is matched or not.
boolean matchFound = matcher.find();
if(matchFound) {
System.out.println("Match found");
} else {
System.out.println("Match not found");
}
}
}

Try playing around with different flags to see how it affects the pattern matching result.

Explanation

In the above example, the word “regex” is being searched in the text.

First, we create the pattern object using the Pattern.compile() method. This method takes two parameters:

  1. The pattern representing the string that is being searched for
  2. A flag to indicate that the search should be case insensitive (optional)

The matcher() method searches for the pattern in the text. It returns a Matcher object containing the information about the search that was performed.

The find() method returns true if the pattern was found in the string. If the pattern is not found in the string, it returns false.

The PatternSyntaxException class

PatternSyntaxException is an unchecked exception that indicates a syntax error in the regular expression pattern language.

Below are its methods:

  • getDescription(): Fetches error description

  • getIndex(): Fetches error index

  • getPattern(): Fetches the erroneous regular expression pattern

  • getMessage(): Returns a multi-line string containing the description of the syntax error, its index, the erroneous regular-expression pattern, and a visual indication of the error index within the pattern.

import java.util.regex.*;
public class Main {
public static void main(String[] args) {
// string to be searched
String text = "Regular Expression";
// invalid regular expression
String searchText = "*";
// compiles the given regex represented by searchText
// and returns an Pattern object instance.
try {
Pattern pattern = Pattern.compile(searchText);
} catch (PatternSyntaxException e) {
System.out.println(">> Inside catch block");
System.out.println();
System.out.println("Description -> " + e.getDescription());
System.out.println("at Index -> " + e.getIndex());
System.out.println("with Pattern -> " + e.getPattern());
System.out.println();
System.out.println("Message -> " + e.getMessage());
}
}
}