Stemming with tidytext

Learn how tidytext uses SnowballC and Hunspell to accomplish stemming.

tidytext relies on other packages for stemming:

Stemming with SnowballC

The tidytext package doesn’t have specific stemming functions and instead relies on SnowballC and standard tidyverse commands.

The SnowballC package in R is an interface to the Snowball stemming library, which is a collection of algorithms for various languages. These algorithms were developed by Martin Porter and are widely used in natural language processing tasks.

SnowballC includes functions such as wordStem(), which takes a word as input and returns its stemmed form using the selected stemming algorithm. This function supports multiple languages, allowing us to choose the appropriate stemming algorithm based on the language of our text data.

Here’s R code demonstrating the use of SnowballC with tidytext:

Get hands-on with 1200+ tech skills courses.