Regular Expressions
Learn how to use regular expressions to describe regular languages.
We'll cover the following
Regular expressions are a convenient notation for representing strings that match simple text patterns. The expression $a^*ac^*b^*$ illustrates two of the four regular expression operations, namely concatenation (via juxtaposition) and Kleene star. The others appear in the following recursive definition.
Formal definition

These are regular expressions:
a) $\phi$, representing the empty language/set, $\{ \}$
b) $\lambda$, representing the onestring language/set, $\{\lambda\}$
c) $c$, for each character, $c$ in the alphabet $\Sigma$, representing the language/set $\{c\}$

For regular expressions $r, r_1, r_2$, the following are also regular expressions:
a) $r_1 + r_2$ (union)
b) $r_1r_2$ (concatenation)
c) $r^*$ (Kleene star)
d) $(r)$ (grouping)
Note: Some authors (and most programming environments) use the notation $r_1 \mid r_2$ for union.
Each regular expression represents a language, which is the set of all strings matching its pattern. Formally, we denote the language associated with a regular expression, $r$, as $L(r)$, so $L(\phi)=\{ \}$, $L(\lambda)=\{\lambda\}$, and $L(c)=\{c\}$ for each symbol $c$ from the alphabet in use. The following expressions also hold:
 $L(r_1+r_2)=L(r_1)\cup L(r_2)$
 $L(r_1r_2)=L(r_1)L(r_2)$
 $L(r^*)=(L(r))^*$
 $L((r))=L(r)$
The above definition is a constructive definition. For example, using $\Sigma =\{a,b,c\}$, we can construct the regular expression $aa^*c^*b^*$ as:
 $a$, $b$, and $c$ are regular expressions by rule 1–c.
 $a^*$, $b^*$, and $c^*$ are regular expressions by rule 2–c.
 $aa^*c^*b^*$ is a regular expression by rule 2–b (applied three times).
We now take $\Sigma =\{a,b\}$ and construct a regular expression representing the language of $a$’s and $b$’s that contain the substring $aba$. We don’t care what precedes or follows the substring $aba$, so a regular expression for this language is $(a+b)^*aba(a+b)^*$.
The formal construction satisfying the definition is:
 $a$ and $b$ are regular expressions by rule 1–c.
 $aba$ is a regular expression by rule 2–b (applied twice).
 $a+b$ is a regular expression by rule 2–a.
 $(a+b)$ is a regular expression by rule 2–d.
 $(a+b)^*$ is a regular expression by rule 2–c.
 $(a+b)^*aba(a+b)^*$ is a regular expression by rule 2–b (applied twice).
Here are some more examples of regular expressions over the alphabet $\Sigma=\{a,b\}$.
Regular Language  Regular Expression 

Strings ending in $ab$  $(a+b)^*ab$ 
Strings that begin and end with different symbols  $a(a+b)^*b+b(a+b)^*a$ 
String of even length  $((a+b)(a+b))^*$ 
Strings with an even number of $a$’s  $(ab^*a+b)^*$ 
The precedence of regular expression operators parallels that of algebra:
 Grouping is the highestprecedence operator.
 Followed by Kleene star (like exponentiation).
 Followed by concatenation (like multiplication).
 Followed by union (like addition).
Note the subexpression $ab^*a$ in the last example in the above table. We know that two $a$’s must be introduced at a time to have an even number, but they don’t have to be consecutive, so we insert a $b^*$ between the two $a$’s.
Remember: Regular expressions constitute a pattern language for text processing.
While we are accustomed to using exponents as a shorthand for repetition, using variables as exponents is not allowed in regular expressions. For example, $a^3$ as a shorthand for $aaa$ is a regular expression because $aaa$ is a regular expression, but $a^n$ is not a regular expression.
Remember: The expression $a^k$, where $k$ is a constant, is a regular expression; but $a^n$, where $n$ is a variable, is not a regular expression!
Regular expressions also observe certain algebraic rules, such as a distributive law, wherein $aa+ab=a(a+b)$ and $b+ba=b(\lambda+a)$. Order matters here, because what looks like multiplication is actually concatenation, so $a(a+b) \neq (a+b)a$. The following table summarizes common rules for rewriting regular expressions.
Note: The quality of regular expressions is used to represent the equality of the corresponding languages. When we write $r_1 = r_2$, instead of $r_1\equiv r_2$, we mean that $L(r_1) = L(r_2)$.
Rule  Description 

$r+s=s+r$  Union is commutative 
$(r+s)+t=r+(s+t)$  Union is associative 
$r+r=r$  Union is idempotent 
$r+\phi=r$  Union’s identity element is $phi$ 
$(rs)t=r(st)$  Concatenation is associative 
$r\lambda=\lambda r=r$  Concatenation’s identity element is $\lambda$ 
$r\phi=\phi r=\phi$  Concatenation’s nullity element is $\phi$ 
$r(s+t)=rs+rt$  Concatenation over union is distributive 
$(r+s)t=rt+rs$  Concatenation over union is distributive 
$(r^*)^*=r^*$  Kleene star is idempotent 
Apart from the abovementioned rules, there are some additional rules for the regular expressions mentioned below:
Get handson with 1200+ tech skills courses.