Using a Suitable Corpus Class

Learn about the different types of corpora in the tm package and plug-in packages for efficient text mining and NLP analysis in R.

Let’s do a deeper exploration of the corpora included as part of the tm package via plug-in packages.

Corpus

Corpus is a convenient alias to create either a SimpleCorpus or a VCorpus, depending on the arguments provided. For example, SimpleCorpus can’t contain XML, so if we were to use Corpus with XML, Corpus would create a VCorpus. Here is an example of Corpus:

Get hands-on with 1200+ tech skills courses.