How to normalize text using the cleantext package in Python
The normalize_whitespace() method is a built-in function that is provided by the cleantext library in Python. We can use it to perform the following operations:
- Replace one or more spacings with a single space.
- Replace one or more line breaks with a single newline.
- Strip leading/trailing whitespaces.
Syntax
from cleantext import clean
clean(text, normalize_whitespace=True)
The cleantext package provides us with the clean function.
Parameters
text: This is the text data to normalize.normalize_whitespace: This is a Boolean value indicating whether to normalize whitespaces in the text. By default, the value isTrue.
Return value
The method returns the normalized text.
Code
import cleantextstring = """hello educativehello edpresso """normalized_string = cleantext.clean(string, normalize_whitespace=True)print("Original String - '" + string + "'")print("\n")print("Normalized String - '" + normalized_string + "'")
Code explanation
- Line 1: We import the
cleantextpackage. - Lines 3-5: This is a string with newlines, and multiple spaces are defined.
- Line 7: We obtain a normalized string removing multiple spaces using the
cleanmethod and passingnormalize_whitespaceasTrue. - Lines 9-11: We print the original and the normalized string.