Using Metadata

Using metadata for enhanced corpus analysis

Metadata simplifies the task of finding important parts of a corpus. Identifying the fundamental concepts of a document and saving these insights will make our research more valuable and save us time.

There are at least three ways to use metadata:

  • Find a document corresponding to a key decision.

  • Filter or subset a group of documents for further analysis.

  • Sort by importance or frequency.

Note: Identify sections as FIND, FILTER, and SORT.

How to use [ ] and [[ ]]

All tm corpora accommodate standard R subsetting tools:

  • Elements of a collection, typically a single bracket [x].

  • Contents of an element, typically double brackets [[x]].

Example 1: [x]

The first example shows the result of base R single-bracket indexing. This is the notation we normally use to subset an element of a data frame or vector. In this case, newVCorpus[1] obtains the first document of newCorpus, including the metadata associated with that document.

Get hands-on with 1200+ tech skills courses.