The aim of data science is to "extract knowledge" from digital data, using algorithms. The applications are considerable, for storing, analyzing and adding value to masses of data: images, sounds, texts, physical measurements or Internet data. There are two types of problem: prediction and modeling. Predictions are made by statistical learning algorithms, the driving force behind the revival of artificial intelligence. A model describes the variability of data and enables new data to be generated. The aim of mathematics here is to understand under what conditions it is possible to learn and thus generalize, or to build models, while the aim of computer science is to develop algorithms that solve these problems.
The Chair's first lecture sets out the mathematical and algorithmic framework for this field, highlighting the issues and techniques that are important for learning. The main difficulty in prediction or modeling stems from the large number of variables in the data - often more than a million, like the number of pixels in an image. This large number generates a Combinatorics explosion of prediction and modeling possibilities. The curse of high dimensionality is countered with algorithms that use a priori information about certain regularities of the problem. The lecture introduces mathematical and algorithmic tools for specifying and exploiting this regularity, for prediction or modeling.