Linear Separability
Understand the concept of linear separability and learn how to change the feature space to make data linearly separable.
We'll cover the following...
In this lesson, we explore the concept of linear separability, which is key to understanding when a hard-margin SVM can perfectly classify data. A dataset is linearly separable if a straight hyperplane can divide the classes without errors. However, many real-world datasets are not linearly separable in their original feature space. We’ll see how this affects the feasibility of the SVM optimization problem and learn how transforming the feature space using techniques such as polynomial feature expansion can make non-linearly separable data linearly separable. This sets the foundation for understanding the kernel trick and more advanced SVM techniques.
Linear separability
If the data isn’t linearly separable, then for every possible , at least one point exists in the dataset for which the constraint is not satisfied. In this case, the optimization problem is infeasible, as shown in the following example:
Suppose a medical researcher is working on a project to classify patient data as either “healthy” or “diseased.” They have collected data on various health parameters, such as age, blood pressure, and cholesterol levels, for a large number of patients. Their goal is to develop a model that can accurately classify new patients as either healthy or diseased based on these features. When we plot it on a graph with two features, healthy and diseased patients form clusters in different areas of the graph. No straight line can separate these two groups completely, indicating that the data isn’t linearly separable, as shown in the figure. Some of the healthy data points violate the constraints and make the optimization problem infeasible.
In cvxpy, we can check the infeasibility in several ways. The following code tests the linear separation of a given dataset:
Here is the explanation for the code above:
- Lines 29–32: The highlighted code checks for the feasibility of the optimization problem solved in the above code. If the optimal value of
wisNone, the optimization problem is infeasible; that is, the data points are not linearly separable in the feature space defined byphi. It means we couldn’t find a weight vector to classify all the data points correctly, and we can’t draw a hyperplane in this feature space. Otherwise, if the optimal value ofwis notNone, the optimization problem is feasible, and the data points are linearly separable in the feature space defined byphi.
Changing the feature space
If data isn’t linearly separable in a feature space defined by a mapping , we may try another feature space in which the data becomes linearly separable. For example, consider the following transformation of the features:
...