Abstract
A convex function can be minimized using a gradient descent algorithm that iteratively adds a collinear vector to the function's gradient. If the function being minimized is Lipschitz and strictly convex, then we show that the gradient descent converges to the unique minimum. We study the convergence of this algorithm when the convex function is quadratic. In this case, the speed of convergence depends on the ratio between the smallest and largest eigenvalues of the Hessian.
When the function to be minimized is a risk that can be evaluated as an average of risks calculated on each training datum, then this gradient algorithm can be replaced by a stochastic gradient algorithm, where the gradient of the risk for each training datum is added one after the other. This algorithm is particularly useful for training neural networks.
The lecture ends with a brief presentation of neural networks, as an introduction to next year's lecture, and mentions the universal approximation theorem, which shows that any function can be approximated by a network with a sufficiently large hidden layer. Deep convolutional neural networks include a large number of layers with neurons whose weights are invariant by translations. Analysis of these networks will be the focus of next year's lecture.