How to deal with contractions in NLP
Contractions in NLP
Contractions are combinations of words that are shortened by dropping letters and replacing them with apostrophes. In NLP,
Why is it essential to deal with them?
There are two main reasons why we should deal with contractions in NLP:
A computer doesn't recognize that the contractions are abbreviations for a combination of words. Hence, it recognizes "I'm" and "I am" as two different terms with different meanings.
Contractions increase the dimensionality of the
. For instance, we'll have a column for the term "I'm" and a column for the term "I am".document-term matrix It is a mathematical matrix that describes the frequency of terms that occur in a collection of documents.
How to deal with contractions
We can use the contractions library of Python to expand the contractions. It can be installed by using the following command:
pip install contractions
The following code snippet demonstrates how to expand the contractions:
import contractionstext = '''Hello mom! Yes, I'm fine. How're you? No, I didn't have lunch. I'm about to go.Are you coming next weekend? I've been missing you.'''expanded_text = []for word in text.split():expanded_text.append(contractions.fix(word))expanded_text = ' '.join(expanded_text)print('Input : ' + text)print('\n')print('Output: ' + expanded_text)
Explanation
Line 7–8: We use
contractions.fix()to expand the shortened words, and append them to theexpanded_textin a loop.Line 10: We add space (
' ') between the words in theexpanded_textstring.
Ambiguity of contractions
It's very easy to use the contractions library to expand the words. However, if we take a closer look, we observe that some contractions represent multiple word combinations. Consider the following for example:
"ain't": "am not / are not / is not / has not / have not"
The contractions library doesn't handle this ambiguity. For the example above, the package always expands to "are not."
This is demonstrated in the code below:
import contractionstext = '''I ain't doing that.'''expanded_text = []for word in text.split():expanded_text.append(contractions.fix(word))expanded_text = ' '.join(expanded_text)print('Input : ' + text)print('\n')print('Output: ' + expanded_text)
The pycontractions library
We can also use the pycontractions library to expand the contractions. It works in the following way:
Case 1: If a contraction corresponds to only one sequence of words,
pycontractionsreplaces the contraction with that word sequence.Case 2: If a contraction corresponds to many possible expansions. Then, in that case,
pycontractionsproduces all the possible expansions and then uses a spell checker. The grammatically incorrect options are discarded, and the correct choice is selected.
It has been observed that pycontractions is more accurate than the contractions library of python as it takes into account the grammar of the text.
Free Resources