Remove all the punctuation marks from a sentence using RegEx
Problem Statement
Given a text/string, remove all the punctuation marks from the string/text using regex. The string can have alphabets, spaces, punctuations, and numbers.
Example:
- Input: string:
hi-^%*(#ans34wer - Output:
hians34wer
Solution
There are two ways to construct the RegEx.
- Retain characters
- Replace characters
Retain characters
We use the following RegEx to retain only space and alphanumeric characters.
[\s\w\d]
\s: Space characters\w: Word characters\d: Digit characters
Example
import reregex = r"[\s\w\d]"test_str = "hi-^%*(#ans34wer"matches = re.findall(regex, test_str, re.MULTILINE)print("String with punctuations - ", test_str)print("String without punctuations - ", "".join(matches))
Explanation
- Line 1: We import the
regexpackage. - Line 3: We define the RegEx.
- Line 5: We define the string/text.
- Line 7: We use the findall method to find the substrings that match the given RegEx.
- Line 9: We print the text with punctuations.
- Line 10: We print the text without punctuations.
Replace characters
Here, we can have two ways of solving the problems. They are as follows:
Replace all the punctuation marks
In this method, we include all the punctuation marks in the RegEx and replace them with empty characters using the re.sub() method in Python.
[!\"#\$%&\'\(\)\*\+,-\./:;<=>\?@\[\\\]\^_`{\|}~]
The RegEx above contains all the punctuation marks.
Example
import reregex = r"[!\"#\$%&\'\(\)\*\+,-\./:;<=>\?@\[\\\]\^_`{\|}~]"test_str = "mjfnd234gsd%@$%*}{:)()@#@#$`~"subst = ""result = re.sub(regex, subst, test_str, 0, re.MULTILINE)print("String with punctuations - ", test_str)print("String without punctuations - ", result)
Explanation
-
Line 1: We import the
regexpackage. -
Line 3: We define the RegEx.
-
Line 5: We define the string/text.
-
Line 7: We define the substitute/replacement character, which is an empty character.
-
Line 9: We use the sub method to replace the matching text with the replacement string.
-
Line 11: We print the text with punctuations.
-
Line 12: We print the text without punctuations.
Replace with negation
In this method, we replace any character other than space, word, or digit with an empty character using the re.sub() method in Python.
[^\s\w\d]
^: Negation operator\s: Space characters\w: Word characters\d: Digit characters
Example
import reregex = r"[^\s\w\d]"test_str = "mjfndgsd%@$%*}{:)()@#@#$`~"subst = ""result = re.sub(regex, subst, test_str, 0, re.MULTILINE)print("String with punctuations - ", test_str)print("String without punctuations - ", result)
Explanation
-
Line 1: We import the
regexpackage. -
Line 3: We define the RegEx.
-
Line 5: We define the string/text.
-
Line 7: We define the substitute/replacement character, which is an empty character.
-
Line 9: We use the sub method to replace the matching text with the replacement string.
-
Line 11: We print the text with punctuations.
-
Line 12: We print the text without punctuation.
Free Resources