Abstract
Most supervised learning methods, including neural networks, are formalized as an optimization problem in which the mean of the errors on the observed data is minimized with respect to the parameters of the prediction model. However, statistical learning gives rise to specific optimization problems, since a mean, or more generally an expectation, is minimized. This specificity makes it natural and efficient to use so-called "stochastic gradient" methods, where the model is updated very frequently, after only a few observations.
This talk presents some recent advances in stochastic gradient optimization, using "variance reduction". For "convex" problems (corresponding to a neural network with no hidden layer), these advances enable us to achieve in theory and in practice an exponential rate of convergence (in the number of iterations) towards the global optimum. The presentation also introduces so-called "conditional gradient" methods, which enable incremental learning where neurons are added to models one after the other.