How to use Naive Bayes for sentiment analysis

Naive Bayes is a popular and efficient algorithm for sentiment analysis due to its simplicity. This algorithm uses the word and feature frequencies to calculate the probability of a document belonging to a particular sentiment class.

Steps to use Naive Bayes for sentiment analysis

Here's a simple guide to using Naive Bayes for sentiment analysis:

  1. Collect a labeled dataset for sentiment analysis, where each data point is paired with a sentiment label. Remove the noise from the dataset, such as special characters, punctuations, and stopwords.

  2. Text documents are converted into feature vectors using the bag-of-words model, with words as features and their frequencies or presence as values. N-grams can be employed to capture context if needed.

  3. Split the dataset into training and testing sets. Train the Naive Bayes model on the training set and evaluate its performance using the testing set.

  4. Apply the Naive Bayes algorithm to the training set. It takes advantage of its feature independence assumption given the class label and simplifies the calculations.

  5. Evaluate the trained model using the testing set.

  6. Find the accuracy of the classifier on the test set.

Coding example

Here is an example code to use Naive Bayes for sentiment analysis:

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report
# Step 1: Load and preprocess the dataset
data = pd.read_csv('data.csv')
text = data['text'].values
labels = data['label'].values
# Step 2: Convert text data into numerical feature vectors
vectorizer = CountVectorizer()
features = vectorizer.fit_transform(text)
# Step 3: Train the Naive Bayes model
nb = MultinomialNB()
nb.fit(features, labels)
# Step 4: Predict sentiment on new data
new_text = ["I love this movie!", "This product is terrible.", "The food was delicious."]
new_features = vectorizer.transform(new_text)
new_predictions = nb.predict(new_features)
print(new_predictions)
# Step 5: Generate the classification report to evaluate the model
predictions = nb.predict(features)
print(classification_report(labels, predictions))

Code explanation

  • Lines 1–4: We import the required modules.

  • Lines 7–9: The dataset is loaded from a CSV file (data.csv), which contains two columns: 'text' (containing the text samples) and 'label' (containing the sentiment labels).

  • Lines 12 and 13: The text data is converted into numerical feature vectors using the CountVectorizer class from scikit-learn.

  • Lines 16 and 17: We initialize the Naive Bayes classifier and then train the model.

  • Lines 20-23: The trained model makes sentiment predictions on the new text data.

  • Lines 26 and 27: The model is evaluated using the testing features, and classification metrics are printed using the classification_report function from scikit-learn. This classification report finds the precision, recall, f1-score, and support concerning its macro and weighted average.

Copyright ©2024 Educative, Inc. All rights reserved