Search⌘ K
AI Features

Using the File Folder as Corpus

Explore how to import text documents into corpora using the tm package in R. Learn to create volatile and simple corpora from file folders, inspect their contents, and manage corpus data effectively to support NLP tasks.

The documentation for tm is nearly 60 pages long and immediately dives into the mechanics of NLP. Rather than trying to understand the entire depth of this package in one go, let’s break it down into understandable and related components. The tm package can be broken down into these main topics:

  • Corpora and sources

  • Metadata

  • Preprocessing: Cleaning, stopwords, and stemming

  • Tokenizing: Words, n-grams, weighting

  • Statistics: Term frequency

  • Visualization

In this lesson, ...