What is the nltk.stem.api module?
Text normalization (i.e., preparing text, words, and documents) is one of the most fundamental tasks of the Natural Language processing field. These text normalization techniques are called Stemming and Lemmatization. nltk.stem is one of the most widely used libraries in Python for Stemming and Lemmatization.
Examples of Stemming and Lemmatization:
Now, the words
cars, car’s, CAR, Car, and cars’ are all derived from the nltk.stem, all of these words will be mapped to car.
nltk.stem.api module
class nltk.stem.api.StemmerI
Bases: object
The class above is used to perform the process of stemming from words.
@abstractmethod
stem(token)
// abstract method of this classes
The method above is used to strip the affixes from the passed parameter token and return the stem of the token.
Parameters
token: string – The token refers to the string passed as a parameter that should be stemmed.
Code
In the English language, either PorterStammer or LancasterStammer can be used (both of which are widely used in the stemming algorithm). The key differences between them are:
-
LancasterStammer is more aggressive in approach than PorterStammer.
-
PorterStammer is computationally more intensive.
-
LancasterStammer is faster and reduces computational time when dealing with datasets.
// importing libraries
from nltk.stem import PorterStemmer
from nltk.stem import LancasterStemmer
// instance of PorterStemmer
>> porter = PorterStemmer()
>> porter.stem("cats")
cat
>> porter.stem("trouble")
troubl
// instance of LancasterStemmer
>> lancaster=LancasterStemmer()
>> lancaster.stem("cats")
cat
>> lancaster.stem("trouble")
troubl
Free Resources