How to implement SVM in Python using Scikit-learn
Overview
Support Vector Machine (SVM) is a simple, supervised machine learning algorithm. SVMs are used for both classification and regression problems.
Note: You can learn more about SVMs here.
In this shot, we will implement an SVM classifier using the Scikit-learn toolkit.
We will use the digits dataset to train the SVM classifier model from scikit-learn. We split the data into train and test (70-30 split) to make sure the classification algorithm is able to generalize well to unseen data.
The model trained uses the learned parameters to classify into one of ten classes, that is, 0 to 9.
Code example
# Importing the necessary librariesimport numpy as npfrom sklearn.datasets import load_digitsfrom sklearn.model_selection import train_test_splitfrom sklearn.svm import SVCfrom sklearn.metrics import confusion_matrix, accuracy_score# Importing the dataset from the sklearn library into a local variable called datasetdataset = load_digits()# Splitting the data test into train 70% and test 30%.# x_train, y_train are training data and labels respectively# x_test, y_test are testing data and labels respectivelyx_train, x_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.30, random_state=4)# Making the SVM ClassiferClassifier = SVC(kernel="linear")# Training the model on the training data and labelsClassifier.fit(x_train, y_train)# Using the model to predict the labels of the test datay_pred = Classifier.predict(x_test)# Evaluating the accuracy of the model using the sklearn functionsaccuracy = accuracy_score(y_test,y_pred)*100confusion_mat = confusion_matrix(y_test,y_pred)# Printing the resultsprint("Accuracy for SVM is:",accuracy)print("Confusion Matrix")print(confusion_mat)
Explanation
- Line 3: We import the
load_digitsdataset from thesklearnlibrary. - Line 4: We import the
train_test_splitfunction fromsklearnto split the data into train and test samples. - Line 5: We import the
SVCclassifier fromsklearn. - Line 6: We use the
sklearnprovidedconfusion_matrixandaccuracy_scorefunctions. - Line 9: We load the dataset into a local variable called
dataset. - Line 14: We split the data into test and train datasets. We use a 70-30 split, where 70% of the data is train and 30% is test.
x-trainandy_traincontain the training data and labels respectively, whilex_testandy_testcontain the testing data and labels. - Line 17: We define an SVM classifier called
Classifierusing a linear kernel. - Line 20: We train the model on the training data and labels.
- Line 23: We use the trained parameters learned from the training data to predict the labels of the test data.
- Line 26: We use the
accuracy_scorefunction and predicted labels to find the accuracy of the model. We multiply by 100 to get the accuracy out of 100. - Line 27: We use the predicted labels to find the confusion matrix.
- Line 30 to 32: We print the evaluation scores for the model.
Model performance
The SVM classifier we defined above gives a 98% accuracy on the digits dataset. The confusion matrix analysis shows that the model is performing really well.