Gradient descent and neural networks

Abstract

A convex function can be minimized using a gradient descent algorithm that iteratively adds a collinear vector to the function's gradient. If the function being minimized is Lipschitz and strictly convex, then we show that the gradient descent converges to the unique minimum. We study the convergence of this algorithm when the convex function is quadratic. In this case, the speed of convergence depends on the ratio between the smallest and largest eigenvalues of the Hessian.

When the function to be minimized is a risk that can be evaluated as an average of risks calculated on each training datum, then this gradient algorithm can be replaced by a stochastic gradient algorithm, where the gradient of the risk for each training datum is added one after the other. This algorithm is particularly useful for training neural networks.

The lecture ends with a brief presentation of neural networks, as an introduction to next year's lecture, and mentions the universal approximation theorem, which shows that any function can be approximated by a network with a sufficiently large hidden layer. Deep convolutional neural networks include a large number of layers with neurons whose weights are invariant by translations. Analysis of these networks will be the focus of next year's lecture.

Gradient descent and neural networks

Abstract

Speaker(s)

Stéphane Mallat

Events

Data science mapping

2018 challenges presentation (1)

Bias-Complexity trade-off

Challenges 2018 (2)

The curse of large dimensions

Dimensionality reduction and denoising

Fourier analysis, filtering and sampling

Image denoising in a few formulas

Transforms and wavelet bases

Tackling a machine learning competition : methodology and practic…

Bayesian and linear kernel learning

Kernel regression and convex optimization

Kernel classification and SVM

Federated learning for medical data

Gradient descent and neural networks

Stochastic and conditional gradients for neural networks

See also