Approximation error with a hidden layer and regularity

Abstract

The universal approximation theorem of a hidden-layer neural network guarantees that the approximation error of a continuous function f(x) will decay towards 0, but it does not specify the decay rate of this error. This rate of decay is linked to the regularity of f(x). We'll see that if f(x) is only locally regular, then the error decays very slowly and suffers from the curse of dimensionality.

The lecture first considers the case of locally regular functions, which are m-fold differentiable in the sense of Sobolev. We demonstrate upper bounds on the approximation error as a function of the number M of neurons used in the hidden layer. We show that an error e is obtained with M = O (^e-d/m) neurons. This decay is very slow if m is small compared with d, which is always the case in high dimensions.

We also consider the case where the Fourier transform of f is parsimonious, which is required by an^L1 criterion proposed by Barron. In this case, we demonstrate that the error decay is much faster and that only M = O (^e-1/2) neurons are needed to reach an error e. However, this Fourier parsimony property is rarely satisfied in applications.

Apart from specific examples, there is no general theorem to explain the increase in approximation performance obtained with neural networks having more hidden layers, for the functions encountered in applications. This problem therefore remains open.

Approximation error with a hidden layer and regularity

Abstract

Speaker(s)

Stéphane Mallat

Events

Introduction to deep neural networks

Introducing 7 data challenges 2019 (1)

Applications of deep neural networks

Introducing 7 data challenges 2019 (2)

Neural network approximations and regularity

Presentation of the 2018 challenge winners

The origins : cybernetics and the perceptron

Weakly supervised learning for visual recognition

Universal single-layer hidden network approximation

Natural language

Approximation error with a hidden layer and regularity

Automatic video analysis

Maximum likelihood network optimization

Deep reinforcement learning

Gradient descent and gradient backpropagation

Convergence of stochastic gradient descent

See also