Introduction to the Course

The internet changed everything

In the space of a year, we implemented a technology that literally changed the world. For better or worse, we now have access to more information than any one of us can possibly read. We can’t even understand how much information we have.

We need a tool to extend our understanding of this information pool that also understands what we need. Unfortunately, much of that information is represented as human language. It’s messy, ambiguous, sometimes false, sometimes true, and sometimes subjective. It’s never predictable.

If only we could use computers to help us navigate through all this messy data. Unfortunately, computers force information into tidy rows, cells, and values. They are incredibly fast, incredibly accurate, and incredibly inflexible.

If only we could train a computer to understand the meaning of human language.

Significance of natural language processing

This is why natural language processing was created. Taken as a general concept, it provides computers with a framework for understanding human language. Specifically, it’s a set of concepts and tools we use to describe documents, their contents, and their relationship with other information. In the context of this course, we’re going to explore how to use the R programming language to provide computers with instructions on how to help us sort through all this information.

Course objectives

This course is designed to provide a working knowledge of how to perform natural language processing with the R programming language. It aims to:

  • Introduce basic concepts like text mining and sentiment analysis.

  • Learn about performing natural language processing with the R programming language.

  • Teach machine learning algorithms for text classification in R.

  • Provide proficiency in implementing text mining techniques in R.

  • Teach sentiment analysis techniques, including lexicon-based and machine learning-based approaches.

  • Cover the ethical and legal implications of text mining and sentiment analysis.

  • Develop data visualization and presentation skills for text mining results.

  • Provide hands-on experience with a real-world text mining project.

Intended audience

Anyone performing data science has to juggle three areas of expertise: computer programming, math, and domain knowledge. For example, the problem domain in environmental science requires knowledge of biology, statistical sampling, and data processing.

In this course, we’ll focus on the computer science involved in natural language processing. Any discussion of NLP concepts will be with a preference toward implementation rather than theory.

You’re here to achieve the quickest path to success in your problem domain. You need to understand the purpose of a tool and how to use it, and you need to quickly return your attention to the project at hand.

Alternatively, you may be a theorist in the field of natural language processing, and you hope to learn more about implementation or the R language. We intend to show you how to use a hammer, not necessarily how the hammer works.

Prerequisites

This isn’t a beginner’s course in the R programming language. We won’t spend time explaining functional programming or R syntax. If you experience code that works—but doesn’t look familiar—please consider this an opportunity to learn something new by performing your own research.

With that said, we favor simplicity and clear expression of coding intent over compactness or execution speed.

Key takeaways

At the end of this course, you should understand:

  • Commonly used natural language concepts, including term frequency, tf-idfThe variations include: TF-IDF Tf-idf tfidf TFIDF Term Frequency-Inverse Document Frequency, visualizing relationships, and sentiment analysis.

  • The strengths and weaknesses of various R-based NLP packages.

Conclusion

Consider this course a shortcut to experience. We’ve provided an overview of tools and techniques used with natural language processing, but there is so much more. The field is evolving at a rapid pace, and what you learn today will be a springboard to new concepts you’ll learn tomorrow.

We’re pleased to be your instructor for this experience with natural language processing!