Support Vector Machine (SVM) is a simple supervised machine algorithm used for classification and regression purposes. What SVM does is tit SVM finds a hyperplane that creates a boundary between two classes of data to classify them.
Text Classification is the process of labeling or organizing text data into groups – it forms a fundamental part of Natural Language Processing.
In the digital age that we live in, we are surrounded by text on our social media accounts, commercials, websites, Ebooks, etc. The majority of this text data is unstructured, so classifying this data can be extremely useful.
Text Classification has a wide array of applications. Some popular uses are:
In this shot, we ll learn about text classification using support vector machines (SVMs).
Below are a series of steps that will allow you to perform data classification on any dataset.
Add the required libraries. If not available, use:
pip install library
import pandas as pd
import numpy as np
from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from sklearn.preprocessing import LabelEncoder
from collections import defaultdict
from nltk.corpus import wordnet as wn
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import model_selection, svm
from sklearn.metrics import accuracy_score
Add the relevant dataset using the following command. A user can use the read_csv()
method of the pandas library to import as in the libraries above.
pd.read_csv(data.csv)
Perform the pre-processing of data. This means transforming any raw data into a more understandable NLP context. The following are the list of processes in pre-processing:
Prepare the training and testing dataset using the train_test_split()
method of the sklearn
library. For better accuracy keep test_size = 0.25
.
Perform encoding on the dataset to differentiate between different labels and assign them 0 or 1.
Encoder = LabelEncoder()
Train_Y = Encoder.fit_transform(Train_Y)
Test_Y = Encoder.fit_transform(Test_Y)
Convert text data to vectors that the model can understand.
The user can make use of TF-IDF
Perform machine learning using SVM.
SVM = svm.SVC(C=1.0, kernel='linear', degree=3, gamma='auto')
SVM.fit(Train_X_Tfidf,Train_Y)
// predict labels
predictions_SVM = SVM.predict(Test_X_Tfidf)
// get the accuracy
print("Accuracy: ",accuracy_score(predictions_SVM, Test_Y)*100)