Support Vector Machines (SVMs) are powerful machine learning models that can be used for both classification and regression tasks. In classification, the goal is to find a hyperplane that separates the data points of different classes with maximum margin. This hyperplane is known as the “optimal hyperplane” or “maximum-margin hyperplane”.
To understand the concept of the hyperplane in SVMs, let’s dive deeper into the working of SVMs and how the optimal hyperplane is determined.
The idea behind SVMs is to find a plane or a boundary that effectively separates the data points of different classes. For example in a binary classification problem where we have 2 different classes and the data is 2-dimensional. It can be separated with a line.
Here we can see a line separating the red from the blue dots the color representing the two different classes.
Pasted image 20230619135001.png
If the data has 3 or more dimensions it needs to be separated by a hyperplane. which can be illustrated like this: Pasted image 20230619133752.png
This principle remains even in higher dimensions, but illustrating them is harder.
(Note: technically speaking all flat affine subspaces are hyperplanes so the 1 dimensional line from image 1 is also a hyperplane)
The data points that lie closest to the hyperplane/the decision boundary are called support vectors. These support vectors play a crucial role in determining the optimal hyperplane. The distance between the hyperplane and the support vectors is known as the margin. The SVM tries to maximize this margin, as it provides a measure of the confidence of classification. In a sense the longer the distance between the different classes the more clearly they are separated and the better the model is.
Drawing 2023-06-20 12.05.43.excalidraw The hyperplane can be represented by the equation: $$y = w^T x + b$$ Where $y$ represents the label, $w$ and $b$ represents the parameters of the hyperplane. For our binary classification problem we only have the classes -1 and 1. So $y \in \begin{Bmatrix} 1, -1 \end{Bmatrix}$.
Since the hyperplane represents the decision boundary, any point on the hyperplane will have to fit this equation $w^T x + b = 0$. This is in the middle of the 2 classes, which can be interpreted as we don’t know what class that point belongs to.
The decision rule is based on the sign of the equation above. If $w^T \times x + b > 0$, the point $x$ is classified as class 1, and if $w^T \times x + b < 0$, the point $x$ is classified as class -1.
The optimal hyperplane is the one that maximizes the margin between the two classes.
The margin is defined as the perpendicular distance between the hyperplane and the closest data points from each class. These closest data points are called support vectors.
Let’s denote the support vectors as $x_+$for class 1 and $x_-$ for class -1. The distance between these support vectors is given by: $$\text{margin} = \frac{2}{|w|}$$ where $|w|$ represents the Euclidean norm of the weight vector $w$. The goal of SVM is to find the hyperplane that maximizes this margin, which in turn improves the generalization performance of the classifier.
To find the optimal hyperplane, SVM solves either the following maximization or the equivalent minimization -problem an optimization problem: $$\max_{w,b} = \frac{2}{|w|} \Leftrightarrow \min_{w,b} \frac{1}{2} |w|^2$$ subject to the constraints: $$y_i(w^T x_i + b) \geq 1 \text{ for all i} = 1,2,…,N$$ Here, $N$ represents the number of data points, and $y_i$ represents the class labels. The constraints ensure that all data points are correctly classified and lie on the correct side of the margin.
The optimization problem can be solved using various algorithms, such as the Sequential Minimal Optimization (SMO) or the Interior Point Methods. Once the optimization problem is solved, the learned parameters $w$ and $b$ define the optimal hyperplane that separates the two classes with the maximum margin.
using the above equations and applying it in python (the minimization problem is solved using the function minimize from the package scipy.optimize):
|
|
Pasted image 20230619174013.png Pasted image 20230619174527.png comparing to the sklearn svm function:
|
|
Pasted image 20230619174206.png
It is important to note that SVM is a linear classifier, which means it can only separate classes using linear decision boundaries. However, by applying different kernel functions, SVM can also handle non-linear decision boundaries by mapping the data into a higher-dimensional feature space.
In conclusion, the hyperplane in SVM represents the decision boundary that separates the two classes, and the optimal hyperplane is the one that maximizes the margin between the classes. SVM achieves this by solving an optimization problem and finding the best parameters that define the hyperplane.
SVM is a powerful and widely used algorithm in machine learning due to its ability to handle both linear and non-linear classification problems. It has proven to be effective in various applications, such as image classification, text categorization, and bioinformatics, among others.
Support vector machines and machine learning on documents Separating Hyperplanes in SVM - GeeksforGeeks