Abstract
The first lecture reviews the mathematical principles of supervised and unsupervised learning algorithms, and presents the architectures of deep neural networks and their applications to image recognition. Supervised learning consists in estimating the answer y = f (x) to a question, from a d-dimensional datum x . We use a base of training examples where, for data xi, we know the value of yi = f (xi). Unsupervised learning consists in estimating the probability distribution p(x) of the data x, from a family of examples xi which are considered as independent realizations following this distribution. The main difficulty of these problems arises from the large dimension d of the data x. Neural networks are computational architectures that include a very large number of parameters in order to approximate f(x) for supervised learning or p(x) for unsupervised learning.
Neural networks take the input x and approximate y = f (x) with a cascade of linear operators followed by pointwise non-linearities such as sigmoids or rectifiers. Neural networks were introduced in the 1950s with a biological motivation. However, it wasn't until 2010 that these networks achieved spectacular results, thanks to the massive increase in training data and computer speed. This made it possible to train large-scale networks. Impressive applications have been made in many fields, including computer vision, speech recognition, sound analysis, natural language analysis, robot control, prediction of physical quantities, medical diagnostics or chess or Go competitions. The fact that the same type of architecture can approximate such different problems indicates that these problems share forms of regularity that are not mathematically understood. The lecture will present network architectures and learning algorithms, but will also attempt to explain the performance of these algorithms, or at least the open questions on this subject.
Computer vision is an important field of application for neural networks. This involves recognizing a scene or object and its location in an image or video, or segmenting the image into a set of identified structures. Until recently, computer vision algorithms were often based on the extraction of structures such as contours, corners or texture elements, which were aggregated with rules. However, these approaches only worked on relatively simple images. The performance of deep neural networks from 2012 onwards came as a great surprise, as they achieved remarkable results on problems long thought unattainable. These networks can now recognize faces better than a human, perform real-time recognition to guide cars, recognize objects or segment complex images. However, they need to be trained on very large databases, and sometimes introduce significant errors. The properties of these algorithms are still poorly understood.