by Sergio Theodoridis, Konstantinos Koutroumbas
This note is not intended to be a rigorous mathematical walkthrough, refer to the book for details.
For a task classify cancer regions, the measurements used for the classification are know as features, comprising a feature vector.
Keywords: decision line; optimality criterion
-
feature generation stage
-
feature seleciton stage: in practice, a larger than necessary number of feature candidates is generated, and the best of them is adopted.
-
classifier design stage
-
system evaluation stage
supervised, unsupervised, semi-supervised learning
Read this and this before continuing.
A linear classifier achieves this by making a classification decision based on the value of a linear combination of the characteristics. Affine hyperplanes are used to define decision boundaries in many machine learning algorithms such as linear-combination (oblique) decision trees, and perceptrons.
If the input feature vector to the classifer is a real vector
The weight vector
Since
Assume there exists a hyperplane s.t.
The perceptron cost
where
Apply the gradient descent method, we have
The online version is more simpler and more popular
$$ \begin{aligned}
w\left(t+1\right) &=w\left(t\right)-\delta_{x}\rho x_{\left(t\right)}\quad\text{if }\delta_{x}w^{T}x_{\left(t\right)}\geq0 \ w\left(t+1\right) &=w\left(t\right)\quad\text{otherwise}
\end{aligned} $$
A basic requirement for the convergence of the perceptron algorithm is the linear separability of the classes. If this is not true, as is usually the case in practice, the perceptron algorithm does not converge. This leads to pocket algorithm.