Abstract
Kernel-based classification algorithms provide a relatively simple mathematical and algorithmic framework for developing learning algorithms. They separate two classes by adjusting a separating hyperplane, after performing a change of variable that associates a vectorΦ( x) with a datum x. Support vector machines optimize the position of the hyperplane by minimizing the empirical risk regularized by a margin criterion. The margin measures the minimum distance between the points of each class and the hyperplane. This minimization can be rewritten as a convex optimization problem under linear constraints, which depends on the scalar products k(x, x') =
The same type of result can be obtained with a change of variableΦ(x) by replacing the kernel with k(x, x') = Φ(x), Φ(x') >. Optimization can be performed directly from the kernel values, by calculating the dual variables of the Lagrangian associated with regularized risk minimization. Mercer's theorem proves that any positive definite kernel can be obtained with a change of variableΦ(x). The main difficulty is to find a change of variables that reduces the risk of generalization. We study the properties of polynomial kernels and Gaussian kernels.