Apprenticeship and the curse of large dimensions

Stéphane Mallat presents his lecture of the year in the series les courTs du Collège de France.

The aim of data science is to "extract knowledge" from digital data, using algorithms. The applications are considerable, for storing, analyzing and adding value to masses of data: images, sounds, texts, physical measurements or Internet data. There are two types of problem: prediction and modeling. Predictions are made by statistical learning algorithms, the driving force behind the revival of artificial intelligence. A model describes the variability of data and enables new data to be generated. The aim of mathematics here is to understand under what conditions it is possible to learn and thus generalize, or to build models, while the aim of computer science is to develop algorithms that solve these problems.

The Chair's first lecture sets out the mathematical and algorithmic framework for this field, highlighting the issues and techniques that are important for learning. The main difficulty in prediction or modeling stems from the large number of variables in the data - often more than a million, like the number of pixels in an image. This large number generates a Combinatorics explosion of prediction and modeling possibilities. The curse of high dimensionality is countered with algorithms that use a priori information about certain regularities of the problem. The lecture introduces mathematical and algorithmic tools for specifying and exploiting this regularity, for prediction or modeling.

Program

Lecture

09:30 - 11:00

Apprenticeship and the curse of large dimensions

Program

Data science mapping

2018 challenges presentation (1)

Bias-Complexity trade-off

Challenges 2018 (2)

The curse of large dimensions

Dimensionality reduction and denoising

Fourier analysis, filtering and sampling

Image denoising in a few formulas

Transforms and wavelet bases

Tackling a machine learning competition : methodology and practical examples

Bayesian and linear kernel learning

Kernel regression and convex optimization

Kernel classification and SVM

Federated learning for medical data

Gradient descent and neural networks

Stochastic and conditional gradients for neural networks

See also