Custom implementation of Support Vector Machines using Python and NumPy, as part of the Combinatorial Decision Making and Optimization university course (Master in Artificial Intelligence, Alma Mater Studiorum - University of Bologna).
Mattia Orlandi
Lorenzo Mario Amorosa
This project requires the following libraries:
numpy
for basic operations on matrices;cvxopt
for the quadratic convex optimization;scikit-learn
for generating and splitting the dataset, to assess accuracy, and to confront our implementation with SVC;matplotlib
for plotting graphs.
The repository is structured in the following way:
- the module
src/svm.py
contains the implementation of SVM for binary classification, with support to kernel functions and soft margin. - the module
src/multiclass_svm.py
contains the implementation of SVM for multiclass classification. - the notebook
src/svm_usecase.ipynb
shows the usage of the SVM for many different purposes. - the package
src/data
contains generators and datasets.
We provided also a script version (src/svm_usecase.py
) of the Jupyter notebook which can be run either in a terminal or in Spyder (recommended). For more clarity, it is suggested to at least read the notebook comments.
The Lagrangian problem for SVM is formulated as follows:
To integrate the soft margin in the formulation, for each data point
Every point
By integrating it into the Lagrangian, the following is obtained:
Its dual problem is formulated as follows:
subject to:
It is a quadratic optimization problem that can be solved using the quadratic library cvxopt
in python, so it is necessary to match the solver's API which, according to the documentation, is of the form:
subject to:
Let
We then convert the sums into vector form and multiply both the objective and the constraint by −1, which turns this into a minimization problem and reverses the inequality in constaints. The optimization problem can be written as:
subject to:
It is now necessary to convert the numpy arrays that express the optimization problem accordingly to cvxopt
format. Supposed
-
$\mathbf{P}:=\mathbf{H}$ a matrix of size$m \times m$ ; -
$\mathbf{q}:=-\mathbf{1}$ a vector of size$m \times 1$ ; -
$\mathbf{G}:=$ a matrix of size$2m \times m$ , such that a diagonal matrix of $-1$s of size$m \times m$ is concatenated vertically with another diagonal matrix of $1$s of size$m \times m$ ; -
$\mathbf{h}:=\mathbf{0}$ a vector of size$2m \times 1$ , with $0$s in the first$m$ cells and $C$s in the other$m$ cells; -
$\mathbf{A}:=\mathbf{y}$ the label vector of size$m \times 1$ ; -
$b:=0$ a scalar.
It has to be noticed that in case of hard margin the constraints on the upper bound of the Lagrangian multipliers are not given, hence
Since the hyperplane is a linear function, the SVM model defined so far is suited only to linearly separable datasets, which is not very useful in real-world scenarios.
To enable the correct classification in the non-linear case, the data to classify is mapped by
However, computing the mapping
where
- dot product (linear case);
- polynomial;
- radial basis function;
- sigmoid.
In the python code
the parameters needed by the solver are defined as follows, using the guideline previously provided:
K = np.zeros(shape=(n_samples, n_samples))
for i, j in itertools.product(range(n_samples), range(n_samples)):
K[i, j] = self.kernel_fn(X[i], X[j])
P = cvxopt.matrix(np.outer(y, y) * K)
q = cvxopt.matrix(-np.ones(n_samples))
# Compute G and h matrix according to the type of margin used
if self.C:
G = cvxopt.matrix(np.vstack((-np.eye(n_samples),
np.eye(n_samples))))
h = cvxopt.matrix(np.hstack((np.zeros(n_samples),
np.ones(n_samples) * self.C)))
else:
G = cvxopt.matrix(-np.eye(n_samples))
h = cvxopt.matrix(np.zeros(n_samples))
A = cvxopt.matrix(y.reshape(1, -1).astype(np.double))
b = cvxopt.matrix(np.zeros(1))
sol = cvxopt.solvers.qp(P, q, G, h, A, b)
The support vectors can be get exploiting the variable sol
, which are those with positive Lagrangian multipliers.
lambdas = np.ravel(sol['x'])
is_sv = lambdas > 1e-5
self.sv_X = X[is_sv]
self.sv_y = y[is_sv]
self.lambdas = lambdas[is_sv]
Then, it is possible to compute
And given
In the python code
the computation is made as follows:
self.w = np.zeros(n_features)
for i in range(len(self.lambdas)):
self.w += self.lambdas[i] * self.sv_X[i] * self.sv_y[i]
self.b = 0
for i in range(len(self.lambdas)):
self.b += self.sv_y[i]
self.b -= np.sum(self.lambdas * self.sv_y * K[sv_index[i], is_sv])
self.b /= len(self.lambdas)
An input
In code
:
y_predict = 0
for lamda, sv_X, sv_y in zip(self.lambdas, self.sv_X, self.sv_y):
y_predict += lamda * sv_y * self.kernel_fn(X, sv_X)
y_predict = np.sign(y_predict + self.b)
The module multiclass_svm.py
contains the implementation of Support Vector Machine for multi-classification purposes based on one-vs-one strategy.
It offers full support to kernel functions and soft margin, in fact the signature of its __init__
method is the same of the binary SVM
.
Given svm.py
.
Each classifier is trained to correctly classify 2 of the N given classes. In the training process there are used only the entries in the dataset to which it corresponds a label of the 2 classes.
Given an unseen example, the prediction of the class is computed deploying a voting schema among the binary SVM
classifiers.
The voting process is based on the standard predict
function for binary SVM
classifiers, so the tested entry is assigned to the class which wins the highest number of binary comparisons. In addition, it is available a mechanism to counteract the possible risk of draw in voting, based on the raw values predicted by the binary classifiers before the application of 'sign' function.
- The SVM model is initially created by specifying the type of kernel (
rbf
/poly
/sigmoid
) and the value of the associated parameters (gamma
,deg
andr
); also, the parameterC
regulating the soft margin is specified. - When the
fit
method is called (passing a supervised training set), the model learns the correct parameters of the hyperplane by minimizing the dual lagrangian function discussed in the previous section. - When the
predict
method is called, new instances are classified according to the learnt parameters.
Tristan Fletcher, Support Vector Machines Explained
Humboldt-Universität zu Berlin, Lagrangian formulation of the SVM