The lecture introduces a mathematical approach to statistical learning through maximum likelihood estimation, information theory and the construction of approximation models. Both unsupervised and supervised learning involve estimating high-dimensional probability distributions from training data. This requires the construction of parameterized models, defined by a prioriinformation . These can be deep neural networks with a specified architecture.
The lecture raises the fundamental issues of high-dimensional modeling, and their mathematical formalization through information measures. It will introduce the notions of Fisher information for model inference by maximum likelihood, and Shannon information for prediction and coding. Shannon information is based on the notion of concentration and the measurement of uncertainty through entropy.
The construction of model classes is based on assumptions about the structure of distributions and their invariants. Links with statistical physics will be explored. Particular attention will be paid to " complex "data involving many scales of variability, whether images, sounds, time series or data from physics. Applications to signal and image compression and unsupervised learning will be studied.