What is remove_punct() clean-text in Python?
In this shot, we will learn about the remove_punct() method in Python.
The remove_punct() method is an in-built function that is provided by the clean-text library in Python.
We can use it to clean our data, which has punctuations in it.
We need to install the clean-text from pip to use it in our programs:
pip install clean-text
Syntax
from cleantext import clean
clean(text, no_punct=True, replace_with_punct="$")
Where:
cleanis the function that is provided by thecleantext.- To remove the punctuations, we will set the parameter
no_puncttoTrue. Then, it will call the in-builtremove_punct()function. - If we want to replace the punctuations instead of removing them, we can use the parameter
replace_with_punct. Here, we will pass$to replace the punctuations in the given text.
Let’s look at an example.
Code example
#import the clean functionfrom cleantext import cleantext = "This, is a sample text! which [has) punctuations."#display the text after removing the punctuationsprint(clean(text, no_punct=True))
Code explanation
- Line 2: We import the
cleanfunction, which is provided by thecleantextmodule. - Line 4: We provide the text that contains the punctuation.
- Line 7: We remove the punctuations from the given
textwith thecleanfunction.
We can replace the punctuations with a string, instead of removing them. We will replace the punctuations with the character $ in the following code. We will do this by passing a character or string to the replace_with_punct parameter:
#import the clean functionfrom cleantext import cleantext = "This, is a sample text! which [has) punctuations."#display the text after removing the punctuationsprint(clean(text, no_punct=True, replace_with_punct="$"))