Support Vector Machine (SVM) is a simple supervised machine algorithm used for classification and regression purposes. What SVM does is tit SVM finds a hyperplane that creates a boundary between two classes of data to classify them.
Text Classification is the process of labeling or organizing text data into groups – it forms a fundamental part of Natural Language Processing.
In the digital age that we live in, we are surrounded by text on our social media accounts, commercials, websites, Ebooks, etc. The majority of this text data is unstructured, so classifying this data can be extremely useful.
Text Classification has a wide array of applications. Some popular uses are:
In this shot, we ll learn about text classification using support vector machines (SVMs).
Below are a series of steps that will allow you to perform data classification on any dataset.
Add the required libraries. If not available, use:
pip install library
import pandas as pd import numpy as np from nltk.tokenize import word_tokenize from nltk import pos_tag from nltk.corpus import stopwords from nltk.stem import WordNetLemmatizer from sklearn.preprocessing import LabelEncoder from collections import defaultdict from nltk.corpus import wordnet as wn from sklearn.feature_extraction.text import TfidfVectorizer from sklearn import model_selection, svm from sklearn.metrics import accuracy_score
Add the relevant dataset using the following command. A user can use the
read_csv() method of the pandas library to import as in the libraries above.
Perform the pre-processing of data. This means transforming any raw data into a more understandable NLP context. The following are the list of processes in pre-processing:
Prepare the training and testing dataset using the
train_test_split() method of the
sklearn library. For better accuracy keep
test_size = 0.25.
Perform encoding on the dataset to differentiate between different labels and assign them 0 or 1.
Encoder = LabelEncoder() Train_Y = Encoder.fit_transform(Train_Y) Test_Y = Encoder.fit_transform(Test_Y)
Convert text data to vectors that the model can understand.
The user can make use of
Perform machine learning using SVM.
SVM = svm.SVC(C=1.0, kernel='linear', degree=3, gamma='auto') SVM.fit(Train_X_Tfidf,Train_Y) // predict labels predictions_SVM = SVM.predict(Test_X_Tfidf) // get the accuracy print("Accuracy: ",accuracy_score(predictions_SVM, Test_Y)*100)
View all Courses