Performing Natural Language Processing with R/

...

Removing Other Unnecessary Terms

Learn to use the data cleaning tools available with the tm package to remove unnecessary terms.

We'll cover the following...

Handling punctuation and numbers
- Overview of transformations in the tm package
Removing white space with stripWhitespace
Removing numbers with removeNumbers
Removing punctuation with removePunctuation
Applying transformations when creating a DTM

Handling punctuation and numbers

You may have noticed several instances of newlines ( \n) in the text. In most cases, punctuation, numbers, and extra white space are unnecessary for NLP analysis. In fact, these elements inflate the word count but don’t add meaning. In this lesson, we’ll talk about removing them as well.

Overview of transformations in the `tm` package

In NLP, stopwords are removed to provide better visibility to significant words. However, stopwords aren’t the only problem when cleaning text data. Text often includes numbers, punctuation, white space, and capitalized versions of words. Therefore, it’s ...

Before We Begin

Important Concepts in Natural Language Processing

Text Mining Package

Understanding Corpora and Sources

Converting Text to Structured Data

Document Insights and Advanced Search Techniques

Working with Metadata in the tm Package

Implementing NLP with the quanteda Package

Implementing NLP with the tidytext Package

Assess What You Have Learned About NLP

Concluding Remarks

Appendix

Removing Other Unnecessary Terms

Handling punctuation and numbers

Overview of transformations in the `tm` package

Assess What You Have Learned About NLP

Removing Other Unnecessary Terms

Handling punctuation and numbers

Overview of transformations in the tm package

Overview of transformations in the `tm` package