Amphithéâtre Marguerite de Navarre, Site Marcelin Berthelot
Open to all
-

Abstract

A neural network transforms input data x by a cascade of linear operators represented by matrices of coefficients, followed by pointwise nonlinearities such as sigmoids or rectifiers. This implements a class of functions that is parameterized by the matrices used to calculate the successive layers. Learning optimizes these parameters to minimize the approximation error of a function =   f (x). This error is evaluated on training examples. We face two types of problem. The approximation problem consists in showing that there exists a function in the class of neural network functions, which precisely approximates f(x). The second problem is to optimize the network parameters in order to calculate the best approximation that minimizes the approximation error. This optimization is performed using a gradient descent algorithm that progressively adjusts the parameters to reduce the error at each iteration. This lecture focuses on the approximation problem.

The approximation error typically depends on the regularity of the function f(x) being approximated. If this function is Lipchitz, it is shown that to reach an error e requires a number of examples that grows exponentially as e-d. This is the curse of high dimension d. To avoid this curse, the function f(x) must be much more regular, and the network must be able to use this underlying regularity. A mathematical challenge is to understand the nature of the regularity that is exploited by deep neural networks.

In high dimensions, it is necessary to use global regularity constraints. This regularity can be captured by the symmetry group of f(x). A symmetry is an operator g that does not change the value of :  f (g.x) =  f (x) for all x. The set of symmetries has a group structure. We often have a priori information about these symmetries. For example, many image recognition problems are invariant by translation, certain rotations or certain deformations. For sound, these symmetries include frequency transpositions or deformations in the time-frequency plane.