Abstract
This lecture addresses the triangle of regularity, approximation and parsimony in a nonlinear framework. The optimal nonlinear approximation of x in an orthonormal basis consists in selecting the coefficients of x in the basis with the largest amplitudes. It is shown that the decay rate of the approximation error depends on the decay rate of the ordered coefficients, which can be specified with lp norms.
The lecture applies these nonlinear approximation results to single-layer hidden neural networks. It shows that training such a network is equivalent to calculating a nonlinear approximation, which depends on the pointwise nonlinearity used in the network. In the case where this nonlinearity is a sinusoid, training computes a nonlinear approximation in a Fourier basis. Such approximations are optimal in Barron spaces, which are characterized in terms of the rate of decay of the approximation error. However, the use of these spaces gives pessimistic bounds, as they do not take into account the fact that x-data are concentrated in typical sets that are much smaller than the global space. In high dimensions, to capture this concentration we need to define probabilistic models and approximate the underlying probability distributions. This will be the subject of next year's lecture.