N-Grams for Text Classification

Learn to extract n-grams for text classification using Python.

Introduction

In text classification, we can use n-grams as features for training a machine-learning model. A good use case of n-grams would be when classifying reviews as positive or negative sentiment. In such a situation, we can use bigrams (2-grams) or trigrams (3-grams) as features that can help the classifier identify phrases that convey sentiment more accurately. As such, we can use them over text representation techniques such as BoW, TF-IDF, or word embeddings because they require minimal preprocessing, which is advantageous when we have limited resources or time constraints. If such constraints don’t exist, we can use them together with the text representation techniques to yield better outcomes during further analysis.

Reasons for choosing n-grams

Here are a few other reasons why we might choose n-grams over other techniques during text preprocessing:

Get hands-on with 1200+ tech skills courses.