clean-text
is a third-party package that is used to pre-process text data to obtain a normalized text representation.
The package can be installed via pip. Check out the following command to install the clean-text
package.
pip install clean-text
The fix_bad_unicode()
method is used to fix the Unicode text that’s broken with the help of the ftfy package. Fixing bad Unicode includes fixing mojibake, HTML entities, other code cruft, and non-standard forms for display purposes.
fix_bad_unicode(text, normalization="NFC")
text
: The text data.normalization
: The type of unicode normalization.The method returns good Unicode data.
import cleantextstring = "✔ No problems"fixed_string = cleantext.fix_bad_unicode(string)print("Original String - '" + string + "'")print("Fixed String - '" + fixed_string + "'")
clean-text
package is imported.fix_bad_unicode()
method.