Support vector machines (SVM) is a supervised machine learning algorithm used for classification and regression tasks. SVM finds a hyperplane in an N-dimensional space that best separates the data points into different classes.
In classification, SVM aims to find the optimal decision boundary that maximizes the margin (distance) between the classes. The data points closest to the decision boundary are called support vectors. SVM can handle both linearly separable and non-linearly separable data through the use of kernels.
Note: The various libraries and frameworks are available to implement SVM, such as scikit-learn in Python or LIBSVM in other programming languages.
These libraries provide ready-to-use implementations of SVM algorithms and make it easier to apply SVM to our specific problem domain.
Sentiment analysis, or opinion mining, is a natural language processing (NLP) technique used to determine the sentiment or emotion expressed in a text. It involves classifying the text as positive, negative, or neutral based on the underlying sentiment.
Here's a high-level overview of the steps involved in using SVM:
Prepare or collect the data
Preprocess the data
Feature selection or extraction
Split the dataset
Apply sentiment analysis algorithm
Model training
Predict new or unseen data
Model evaluation
Here's a basic code example to explain how we can use SVM for sentiment analysis by using the scikit-learn library in Python:
import pandas as pdfrom sklearn.model_selection import train_test_splitfrom sklearn.feature_extraction.text import CountVectorizerfrom sklearn.svm import SVCfrom sklearn.metrics import classification_report# Step 1: Load and preprocess the datasetdata = pd.read_csv('sentiment_dataset.csv')text = data['text'].valueslabels = data['label'].values# Step 2: Split the dataset into training and testing setstext_train, text_test, labels_train, labels_test = train_test_split(text, labels, test_size=0.2, random_state=42)# Step 3: Convert text data into numerical feature vectorsvectorizer = CountVectorizer()features_train = vectorizer.fit_transform(text_train)features_test = vectorizer.transform(text_test)# Step 4: Train the SVM modelsvm = SVC(kernel='linear')svm.fit(features_train, labels_train)# Step 6: Predict sentiment on new datapredictions = svm.predict(features_test)new_text = ["I love this movie!", "This product is terrible.", "The food was delicious."]new_features = vectorizer.transform(new_text)new_predictions = svm.predict(new_features)print(new_predictions)# Step 7: Generate the classification report to evaluate the modelprint(classification_report(labels_test, predictions))
Line 1–5: Import the required libraries.
Line 8–10: The dataset is loaded from a CSV file (sentiment_dataset.csv
), which contains two columns: 'text'
(containing the text samples) and 'label'
(containing the sentiment labels).
Line 13: The dataset is split into training and testing sets using the train_test_split
function from scikit-learn.
Line 16–18: The text data is converted into numerical feature vectors using the CountVectorizer
class from scikit-learn.
Line 21–22: An SVM model with a linear kernel is created using the SVC
class and trained on the training features and labels.
Line 25–29: Sentiment predictions are made on new text data using the trained model.
Line 32: The model is evaluated using the testing features, and classification metrics are printed using the classification_report
function from scikit-learn. In this classification report, we find the precision, recall, f1-score, and support with respect to its macro and weighted average.