Search⌘ K
AI Features

Understanding Lemmatization

Discover the concept of lemmatization and how it reduces word forms to their base lemmas using spaCy. Learn its importance in natural language understanding, especially for applications like ticket booking by standardizing word variants and handling special cases.

What is lemmatization?

A lemma is the base form of a token. We can think of a lemma as the form in which the token appears in a dictionary. For instance, the lemma of eating is eat; the lemma of eats is eat; ate similarly maps to eat. Lemmatization is the process of reducing the word forms to their lemmas. The following code is a quick example of how to do lemmatization with spaCy:

Python 3.5
import spacy
nlp = spacy.load("en_core_web_md")
doc = nlp("I went for working and worked for 3 years.")
for token in doc:
print(token.text, token.lemma_)

By now, we should be familiar with what the first three lines of the code do. Recall ...