Python uses the re.sub() function to replace text that matches a regular expression pattern with a new value in a string.
What is the re.sub() function in Python?
Key takeaways:
The
re.sub()function in Python is commonly used for substituting patterns in strings based on RegEx.Group capturing in RegEx allows for the selective replacement of specific parts of a matched pattern while keeping other part of the string intact.
The
re.sub()function helps remove unnecessary text, convert text cases, clean input, correct spelling errors, and many more.
The re.sub() function
Stephen Cole Kleene invented regular expressions (RegEx), which are powerful tools used for searching, matching, and manipulating text patterns. They enable us to define complex search patterns using a combination of characters and special symbols.
Python provides re module, that supports regular expressions and is useful for text processing tasks. The re module provides users with various functions to search for a pattern in a particular string.
The re.sub()function is one of the re module functions. It is a substitution function that replaces all occurrences of the specified pattern with a new string. This function has diverse applications:
- Remove unnecessary characters
- Convert the case of characters in a string
- Standardize formats to prepare data for analysis
- Correct spelling errors
- Replace specific words with synonyms
- Check for valid patterns
- Clean input to ensure data integrity and prevent errors
Syntax
The re.sub() function represents a substring and returns a string with replaced values. Multiple elements can be replaced using a list when we use this function.
re.sub(pattern, repl, string, count=0, flags=0)
Parameters
-
pattern: This denotes the regular expression that needs to be replaced. Here, regular expressions can be strings, regex characters (^, *, +), or special sequences (\w, \s, \d). -
repl: This denotes the string/pattern with which thepatternis replaced. -
string: This denotes the string on which there.sub()operation will be executed. -
count(Optional): This denotes the number of replacements that should occur. If we want to replace all matches, we can skip this parameter or set it to0. -
flags(Optional): This serves to modify the behavior of the regular expression operation. For example, the value could bere.IGNORECASEorre.NOFLAG. To use multiple flags, we need to specify them using the bitwise OR operator (|). For example,flags=re.IGNORECASE | re.MULTILINEapplies both flags when making substitutions.
The re.sub() in action
Let’s look at the code snippet below to understand it better.
# Importing the re moduleimport re# Given strings = "I am a human being."# Performing the Sub() operationres_1 = re.sub('a', 'x', s)res_2 = re.sub('[a,I]','x',s)# Print resultsprint(res_1)print(res_2)# The original string remains unchangedprint(s)
Code explanation
-
line 2: We import the
remodule. -
Line 5: We enter a sample string.
-
Line 8: We replace all the instances of
awithxin the strings. -
Line 9: We replace all the instances of
aandIwithxin the strings. -
Lines 12–13: We print the results.
-
Line 16: We print the original string.
Using count and flags parameters
Let's see another example to understand the usage of count and flags parameters in the re.sub() function.
# Importing the re moduleimport re# Given strings = "I am a human being."# Performing the Sub() operation with count parameterres_1 = re.sub('a', 'x', s, count=2)# Performing the Sub() operation with flags parameterres_2 = re.sub('i', 'x', s, flags=re.IGNORECASE)# Print Resultsprint(res_1)print(res_2)
We set count=2 on line 8, which will replace only two instances of a with x. On line 11, we use the flags parameter, allowing case-insensitive matching.
Using capturing groups in the pattern parameter
Capturing groups are parts of your RegEx pattern that are each treated as a single unit. By using capturing groups, we can treat each part as a single unit and manipulate each unit together and independently of the other group. For example, if a we look at a URL, it consists of the application protocol part (https://), and the domain part (educative.io)—each can be considered a separate group. Now suppose you want to change the protocol and replace it with, say, ftp://. Capturing groups provide you with the flexibility to replace only that group with the string of your choice.
As shown below, the capturing groups are defined by parentheses in a RegEx pattern which can be referenced later. In re.sub(), these groups are referenced in the replacement string with back references like \1 for the first match and \2 for the second, making substitutions more dynamic.
Let us now understand how to use capturing groups in pattern parameter with the help of code example:
# Importing the re moduleimport res = 'Welcome to https://educative.com'print(re.sub(r'(https://)(educative.com)', r'\1educative.io', s))
In line 6, (https://) and (educative.com) are two capturing groups. The first group captures the scheme of the URL (e.g., https://) and the second group(educative.com) captures the domain part of the URL (e.g., educative.com). Next, r'\1educative.io' is replacement where \1 refers to the first group (e.g., https://) and educative.io is the new domain we want to use.
Quiz!
What is the purpose of count parameter in re.sub() operation?
It limits the number of matches in the string.
It determines the length of the replacement string.
It limits the number of substitutions made.
None of the above
In summary, Python's re.sub() function is a great tool for replacing text using regular expression patterns. By leveraging its pattern matching and replacement capabilities, we can effectively manage and transform text to fit our needs.
If you're eager to deepen your understanding of Python and sharpen your problem-solving skills, our Learn to Code: Become a Software Engineer path is the perfect next step. With this structured path, you'll go beyond fundamental concepts like generators and dive into advanced topics that will prepare you for a successful career in software engineering.
Don’t just learn Python—become proficient and ready for the challenges of the real world.
Frequently asked questions
Haven’t found what you were looking for? Contact Us