Search⌘ K
AI Features

Document-Term Matrix

Explore the creation and interpretation of document-term matrices, a key structured data format for text analysis. Understand how rows represent documents and columns represent terms with frequencies. Discover how to use R's tm package to find frequent terms and gain insights into text corpora.

We'll cover the following...

A document-term matrix is fairly simple to understand. It is a matrix with rows and columns.

  • Each row represents a document. In our case, there will be one row for Frankenstein and a second row for The Last Man.

  • Each column represents a term. In this case, terms are words, although they can be sentences, lines, paragraphs, or n-grams (more on these in a later lesson).

  • Each cell in the matrix contains the frequency of the term in the document.

On the other hand, a term-document matrix (TDM) is a data structure that is essentially the ...