About RegEx

Learn about default nature and flags in RegEx.

Overview of RegEx nature

RegEx will try to match as much as possible. This behavior is called greedy matching because the engine will “greedily” attempt to match anything it can. You will learn how to avoid this type of behavior later in the course.

RegEx is almost universal as the same syntax is used everywhere with only slight variations. Regardless of where you use RegEx, however, its basic principles remain the same.

Optional flags

  1. Global flag (g): By default, RegExs are eager in nature, so they will only return the first match. To get all the matches, we use the g flag.
var str = "123-45-67";
console.log(str.match(/\d/));
// add global flag to pattern (/\d/g) and see change in output

Let’s walk through another set of texts.

Hi
Hello
how
Happy

Considering the texts mentioned above:

  • RegEx without global flag will be /H/, resulting in a match “H” (from Hi).

  • RegEx with the global flag will be /H/g, resulting in a match of “H, H, H” (from Hi, Hello, Happy).

Explanation of the above bullet points: When you apply RegEx without a global flag, It will return the first match by default. When you apply the global flag, it will start searching again from where it returned the last successful search.

  1. Case insensitive (i): By default, RegExs are case meaning “A” and “a” are different. To make them read the same, we use the i flag.

When using the i flag, character “B” will match both “B” and “b”. Likewise, character “b” will match both “B” and “b”. When using the case sensitivity flag, there is no difference between uppercase and lowercase alphabets.

let str = "A a";
console.log(str.match(/A/g));
// add case-sensitive (i) flag to pattern (/A/gi) or (/a/gi) and see change in output
// here global is used because RegEx are eager to return

Let’s walk through another set of texts.

Hi
Hello
how
Happy

Considering the texts mentioned above:

  • RegEx with the global flag and without the case-sensitive flag will be /H/g. The combination will result in a match of “H, H, H” (from Hi, Hello, Happy).

  • RegEx with the global flag and the case-sensitive flag will be /H/ig. The combination will result in matches “H, H, h, H” (from Hi, Hello, how, Happy).

Explanation of the above bullet points: When you use RegEx without a case-sensitive flag, RegEx will match what is provided by default. Hence the character “h” from “how” will not match. However, when you add i, RegEx will not differentiate between alphabets and will interpret the lowercase and uppercase as the same.

  1. Multi-Line (m): By default, RegEx does not extend to multi-line. To match the start/end of each line, you need to use the m flag. (Multi-line flags affect the behavior of anchor characters only.)
let str = `1. Hi
2. How are you
3. What is your name`;
console.log(str.match(/^\d/g));
// add multi-line (m) flag to pattern (/^\d/gm) and see change in output

Let’s walk through another set of texts.

Hi
Hello
how
Happy

Considering the texts mentioned above:

  • RegEx with the global flag and without the multi-line flag will be /^H/g. The combination will result in a match “H” (from Hi).

  • RegEx with the global flag and the multi-line flag will be /^H/mg. The combination will result in a match of “H, H, H”(from Hi, Hello, Happy).

Explanation of the image: The multi-line flag works only with the caret symbol, such as the beginning of the text. If a multi-line flag is not mentioned, it will search for the pattern at line breaks, but if the multi-line flag is mentioned, then in each line, RegEx will search the pattern. Whether it is line break or not.

  1. Unicode (u): By default, RegEx matches all the 2 bytes characters. So to match a 4 bytes character, we use the u flag.

In general, maximum characters are encoded with 2 bytes, but for some of the characters, 2 bytes is not enough. Examples include emoticons, hexadecimal numbers, and many more. To encode them, 4 bytes are required. To match this in RegEx, you need to use a Unicode flag.

  1. Single line dotall (s): By default, “.” is used to match any character except newline (\n). When you need to match the newline, use the s flag.
let str = `Hey\nHi`;
console.log(str.match(/\w+.\w+/g));
// add single-line (s) flag to pattern (/\w+.\w+/gs) and see change in output

Note: There are more ways than one to match expressions. Matching expressions is simply a matter of efficiency and complexity.