Performing Natural Language Processing with R/

...

Recognizing Parts of Speech with tidytext

Explore parts of speech analysis with tidytext to extract and study word categories in text data.

We'll cover the following...

Review parts of speech
Identifying POS with tidytext
Result
POS in use
Limits of getAWord()
Summary of parts of speech with tidytext

Review parts of speech

tidytext provides tools to extract and analyze parts of speech from a text corpus, allowing for the exploration of their distribution and characteristics. By utilizing its functions, researchers can gain insights into language usage and patterns within a given text dataset. Parts of speech are the grammatical categories that words belong to in a sentence. These categories include nouns, verbs, adjectives, adverbs, pronouns, prepositions, conjunctions, and interjections. POS is a type of metadata about a word and helps understand the overall intent of a phrase.

Identifying POS with `tidytext`

The tidytext package doesn’t provide any specific tools for POS but instead relies on dplyr and the parts_of_speech data frame from the Moby Project by Grady Ward. This is a data frame with 205,985 rows and two variables: word and pos.

word: An English word.
pos: The part of speech of the word, such as noun, adverb, or adjective.

Here’s an example of the use of part_of_speech coupled with the dplyr command, inner_join:

Press + to interact

Before We Begin

Important Concepts in Natural Language Processing

Text Mining Package

Understanding Corpora and Sources

Converting Text to Structured Data

Document Insights and Advanced Search Techniques

Working with Metadata in the tm Package

Implementing NLP with the quanteda Package

Implementing NLP with the tidytext Package

Assess What You Have Learned About NLP

Concluding Remarks

Appendix

Recognizing Parts of Speech with tidytext

Review parts of speech

Identifying POS with `tidytext`

Assess What You Have Learned About NLP

Recognizing Parts of Speech with tidytext

Review parts of speech

Identifying POS with tidytext

Identifying POS with `tidytext`