Stemming and Lemmatization
Learn about stemming and lemmatization techniques, and how to apply them using Python.
We'll cover the following...
Stemming
Stemming is a text preprocessing technique that simplifies words by stripping prefixes and suffixes, yielding base forms for effective processing and storage. For instance, the word “running” becomes “run” once we’ve performed stemming. However, while performing this technique, it’s important to note that it can result in inaccuracies and semantic loss, as we’ll get to see. Because of this, even though stemming has advantages like reducing vocabulary, it requires careful application.
Let’s review the code line by line:
Lines 1–2: We import the
pandas
library andSnowballStemmer
class fromnltk.stem
.Line 4: We load the
reviews.csv
dataset.Line 5: We create an instance of the
SnowballStemmer
class with the English language as the stemmer’s target language. ...