Installation and Documentation

Learn about installing the text mining package and its documentation.

Introduction to the tm package

R is an excellent programming language for statistics and matrix manipulation. Given a table or matrix where data is organized, R can return a wealth of insights and visualizations. However, data from real-life applications is rarely stored in clean tables with well-ordered rows and columns. Data is messy and requires cleaning, which is often referred to as data wrangling.

Human language is a prime example of messy data. Concepts aren’t tagged, and context is fluid. There are no standardized rules and no reliable indicators to help a computer understand what is being said and how to separate the information from the presentation.

This is where natural language processing (NLP) comes in. NLP is a collection of tools and techniques to convert human language into a format useful to computers. If we wanted to, we could do this by hand, but it would be painfully long. Instead, it’s easier to use a framework. In this part of the course, we’ll use a package called tm, which is short for text mining.

Installation process

To use tm, we’ll need to install it in our copy of R.

Get hands-on with 1200+ tech skills courses.