Using NLTK Part-of-Speech Tagger
Explore how to perform part-of-speech tagging with NLTK using both the default perceptron tagger and a Hidden Markov Model tagger. Understand the use of the universal POS tagset for simplifying tags, and gain hands-on experience training and applying these taggers on text data to support grammar correction.
We'll cover the following...
NLTK taggers
The NLTK library provides different ways to implement POS tagging, including CRFTagger, StanfordPOSTagger, BrillTagger, etc., but in this lesson, we'll focus on the two most commonly used implementations:
The perceptron model, which is also the default tagger
The HMM tagger
NLTK default classifier
By default, NLTK uses a perceptron tagger, more specifically a greedy averaged perceptron tagger. This is a greedy averaged perceptron tagger, which is simply a pre-trained feed-forward neural network that guesses a tag, adjusts the weights according to whether the guess was correct, and averages the weight adjustments over the number of iterations. This creates a model that is essentially a dictionary of weights associated with the input features (or input word/sentence), that can then output the associated solution in ...