Naive Bayes is a popular and efficient algorithm for sentiment analysis due to its simplicity. This algorithm uses the word and feature frequencies to calculate the probability of a document belonging to a particular sentiment class.
Here's a simple guide to using Naive Bayes for sentiment analysis:
Collect a labeled dataset for sentiment analysis, where each data point is paired with a sentiment label. Remove the noise from the dataset, such as special characters, punctuations, and stopwords.
Text documents are converted into feature vectors using the bag-of-words model, with words as features and their frequencies or presence as values. N-grams can be employed to capture context if needed.
Split the dataset into training and testing sets. Train the Naive Bayes model on the training set and evaluate its performance using the testing set.
Apply the Naive Bayes algorithm to the training set. It takes advantage of its feature independence assumption given the class label and simplifies the calculations.
Evaluate the trained model using the testing set.
Find the accuracy of the classifier on the test set.
Here is an example code to use Naive Bayes for sentiment analysis:
import pandas as pdfrom sklearn.feature_extraction.text import CountVectorizerfrom sklearn.naive_bayes import MultinomialNBfrom sklearn.metrics import classification_report# Step 1: Load and preprocess the datasetdata = pd.read_csv('data.csv')text = data['text'].valueslabels = data['label'].values# Step 2: Convert text data into numerical feature vectorsvectorizer = CountVectorizer()features = vectorizer.fit_transform(text)# Step 3: Train the Naive Bayes modelnb = MultinomialNB()nb.fit(features, labels)# Step 4: Predict sentiment on new datanew_text = ["I love this movie!", "This product is terrible.", "The food was delicious."]new_features = vectorizer.transform(new_text)new_predictions = nb.predict(new_features)print(new_predictions)# Step 5: Generate the classification report to evaluate the modelpredictions = nb.predict(features)print(classification_report(labels, predictions))
Lines 1–4: We import the required modules.
Lines 7–9: The dataset is loaded from a CSV file (data.csv
), which contains two columns: 'text'
(containing the text samples) and 'label'
(containing the sentiment labels).
Lines 12 and 13: The text data is converted into numerical feature vectors using the CountVectorizer
class from scikit-learn.
Lines 16 and 17: We initialize the Naive Bayes classifier and then train the model.
Lines 20-23: The trained model makes sentiment predictions on the new text data.
Lines 26 and 27: The model is evaluated using the testing features, and classification metrics are printed using the classification_report
function from scikit-learn. This classification report finds the precision, recall, f1-score, and support concerning its macro and weighted average.