Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

python
communitycreator
method
normalize
cleantext

How to normalize text using the cleantext package in Python

abhilash

What is the cleantext package?

cleantext is a third-party package that pre-processes text data to obtain a normalized text representation.

The package can be installed via pip. Check the following command to install the clean-text package:

pip install clean-text

The normalize_whitespace() method is a built-in function that is provided by the cleantext library in Python. We can use it to perform the following operations:

  1. Replace one or more spacings with a single space.
  2. Replace one or more line breaks with a single newline.
  3. Strip leading/trailing whitespaces.

Syntax

from cleantext import clean

clean(text, normalize_whitespace=True)

The cleantext package provides us with the clean function.

Parameters

  • text: This is the text data to normalize.
  • normalize_whitespace: This is a Boolean value indicating whether to normalize whitespaces in the text. By default, the value is True.

Return value

The method returns the normalized text.

Code

import cleantext

string = """hello educative

hello edpresso    """

normalized_string = cleantext.clean(string, normalize_whitespace=True)

print("Original String - '" + string + "'")
print("\n")
print("Normalized String - '" + normalized_string + "'")

Code explanation

  • Line 1: We import the cleantext package.
  • Lines 3-5: This is a string with newlines, and multiple spaces are defined.
  • Line 7: We obtain a normalized string removing multiple spaces using the clean method and passing normalize_whitespace as True.
  • Lines 9-11: We print the original and the normalized string.

RELATED TAGS

python
communitycreator
method
normalize
cleantext
RELATED COURSES

View all Courses

Keep Exploring