How to use SVM for sentiment analysis

SVM algorithm

Support vector machines (SVM) is a supervised machine learning algorithm used for classification and regression tasks. SVM finds a hyperplane in an N-dimensional space that best separates the data points into different classes.
In classification, SVM aims to find the optimal decision boundary that maximizes the margin (distance) between the classes. The data points closest to the decision boundary are called support vectors. SVM can handle both linearly separable and non-linearly separable data through the use of kernels.

Note: The various libraries and frameworks are available to implement SVM, such as scikit-learn in Python or LIBSVM in other programming languages.

These libraries provide ready-to-use implementations of SVM algorithms and make it easier to apply SVM to our specific problem domain.

Sentiment analysis

Sentiment analysis, or opinion mining, is a natural language processing (NLP) technique used to determine the sentiment or emotion expressed in a text. It involves classifying the text as positive, negative, or neutral based on the underlying sentiment.

Here's a high-level overview of the steps involved in using SVM:

  1. Prepare or collect the data

  2. Preprocess the data

  3. Feature selection or extraction

  4. Split the dataset

  5. Apply sentiment analysis algorithm

  6. Model training

  7. Predict new or unseen data

  8. Model evaluation

Different classes to predict with SVM
Different classes to predict with SVM

Implementation of SVM for sentiment analysis

Here's a basic code example to explain how we can use SVM for sentiment analysis by using the scikit-learn library in Python:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import SVC
from sklearn.metrics import classification_report
# Step 1: Load and preprocess the dataset
data = pd.read_csv('sentiment_dataset.csv')
text = data['text'].values
labels = data['label'].values
# Step 2: Split the dataset into training and testing sets
text_train, text_test, labels_train, labels_test = train_test_split(text, labels, test_size=0.2, random_state=42)
# Step 3: Convert text data into numerical feature vectors
vectorizer = CountVectorizer()
features_train = vectorizer.fit_transform(text_train)
features_test = vectorizer.transform(text_test)
# Step 4: Train the SVM model
svm = SVC(kernel='linear')
svm.fit(features_train, labels_train)
# Step 6: Predict sentiment on new data
predictions = svm.predict(features_test)
new_text = ["I love this movie!", "This product is terrible.", "The food was delicious."]
new_features = vectorizer.transform(new_text)
new_predictions = svm.predict(new_features)
print(new_predictions)
# Step 7: Generate the classification report to evaluate the model
print(classification_report(labels_test, predictions))

Explanation

  • Line 1–5: Import the required libraries.

  • Line 8–10: The dataset is loaded from a CSV file (sentiment_dataset.csv), which contains two columns: 'text' (containing the text samples) and 'label' (containing the sentiment labels).

  • Line 13: The dataset is split into training and testing sets using the train_test_split function from scikit-learn.

  • Line 16–18: The text data is converted into numerical feature vectors using the CountVectorizer class from scikit-learn.

  • Line 21–22: An SVM model with a linear kernel is created using the SVC class and trained on the training features and labels.

  • Line 25–29: Sentiment predictions are made on new text data using the trained model.

  • Line 32: The model is evaluated using the testing features, and classification metrics are printed using the classification_report function from scikit-learn. In this classification report, we find the precision, recall, f1-score, and support with respect to its macro and weighted average.

Copyright ©2024 Educative, Inc. All rights reserved