What is part-of-speech (PoS) tagging?
Part-of-speech (PoS) tagging is the process of labeling words in a text according to their word types, such as nouns, adjectives, adverbs, verbs, prepositions, conjunctions, pronouns, interjections, etc.
How it works
Let's try to understand how PoS tagging works through this example:
In this example, "I" is labeled as a personal pronoun (PRP), "work" is labeled as a third-person singular present verb (VBP), "at" as a preposition (IN), and "Educative" as a singular noun (NN).
Implementation
PoS tagging can be implemented by using the nltk library. We need to follow these steps to implement POS tagging:
Step 1
We first need to import the relevant libraries. We can do this using the following code snippet:
import nltkfrom nltk import word_tokenize
Step 2
Next, we give the text that needs to be labeled as the input, and tokenize it. The word_tokenize() function in nltk tokenizes the text into separate words. We can do this using the following code snippet:
text = "I love reading Educative Answers."tokens = nltk.word_tokenize(text)
Step 3
In this step, we label the words with tags. This can be done by using the pos_tag() function. The following snippet demonstrates this step:
print("Parts of Speech: ",nltk.pos_tag(tokens))
After this step, a list consisting of the tokenized words and their tags is printed, as follows:
Parts of Speech: [('I', 'PRP'), ('love', 'VB'),('reading', 'VB'), ('Educative', 'NN'), ('Answers', 'NNS')]
Uses of PoS tagging
PoS tagging finds its uses in the following domains:
Named entity recognition (NER)
Sentiment analysis
Word-sense disambiguation
Question answering
Hence, PoS tagging is an integral part of NLP and is vital to differentiate between the two meanings of a word.
Free Resources