Recognizing Parts of Speech with tidytext

Explore parts of speech analysis with tidytext to extract and study word categories in text data.

Review parts of speech

tidytext provides tools to extract and analyze parts of speech from a text corpus, allowing for the exploration of their distribution and characteristics. By utilizing its functions, researchers can gain insights into language usage and patterns within a given text dataset. Parts of speech are the grammatical categories that words belong to in a sentence. These categories include nouns, verbs, adjectives, adverbs, pronouns, prepositions, conjunctions, and interjections. POS is a type of metadata about a word and helps understand the overall intent of a phrase.

Identifying POS with tidytext

The tidytext package doesn’t provide any specific tools for POS but instead relies on dplyr and the parts_of_speech data frame from the Moby Project by Grady Ward. This is a data frame with 205,985 rows and two variables: word and pos.

  • word: An English word.

  • pos: The part of speech of the word, such as noun, adverb, or adjective.

Here’s an example of the use of part_of_speech coupled with the dplyr command, inner_join:

Get hands-on with 1200+ tech skills courses.