Performing Natural Language Processing with R/

...

/

Document-Term Matrix

Document-Term Matrix

Learn about how a document-term matrix is a commonly accepted data structure for natural language processing.

We'll cover the following...

Understanding the DTM
Uses of a DTM

A document-term matrix is fairly simple to understand. It is a matrix with rows and columns.

Each row represents a document. In our case, there will be one row for Frankenstein and a second row for The Last Man.
Each column represents a term. In this case, terms are words, although they can be sentences, lines, paragraphs, or n-grams (more on these in a later lesson).
Each cell in the matrix contains the frequency of the term in the document.

On the other hand, a term-document matrix (TDM) is a data structure that is essentially the ...