Introduction to sources

The tm package can import several types of documents with special functions called sources. The tm package comes with a set of sources for general-purpose work, but a developer can add additional sources through plug-ins. In this lesson, we’ll look at the sources included with tm .

The tm package provides getsources() to produce a list of available sources. Run the following code to list the available sources in this copy of tm:

Line 4: This line sets the DataDirectory variable to the string “data/”. It specifies the directory where the text files are located.
Line 5: This line creates a character vector fileList containing the names of files in DataDirectory that match the specified pattern. In this case, it looks for files that start with mws_ and end with .txt (such as mws_1.txt or mws_2.txt).
Line 8: This line uses the readtext() function from the readtext package to read the text content of the files specified in fileList. The readtext() function returns a data.frame with two columns:
- text (the content of the text file) and doc_id (the identifier of the document).
- The paste0() function concatenates DataDirectory ...

1.Before We Begin

2.Important Concepts in Natural Language Processing

3.Text Mining Package

4.Understanding Corpora and Sources

5.Converting Text to Structured Data

6.Document Insights and Advanced Search Techniques

7.Working with Metadata in the tm Package

8.Implementing NLP with the quanteda Package

9.Implementing NLP with the tidytext Package

Assessment

10.Concluding Remarks

11.Appendix

Using a Suitable Source Type

Introduction to sources

`DataframeSource`