Abstract
The universal approximation theorem of a hidden-layer neural network guarantees that the approximation error of a continuous function f(x) will decay towards 0, but it does not specify the decay rate of this error. This rate of decay is linked to the regularity of f(x). We'll see that if f(x) is only locally regular, then the error decays very slowly and suffers from the curse of dimensionality.
The lecture first considers the case of locally regular functions, which are m-fold differentiable in the sense of Sobolev. We demonstrate upper bounds on the approximation error as a function of the number M of neurons used in the hidden layer. We show that an error e is obtained with M = O (e-d/m) neurons. This decay is very slow if m is small compared with d, which is always the case in high dimensions.
We also consider the case where the Fourier transform of f is parsimonious, which is required by anL1 criterion proposed by Barron. In this case, we demonstrate that the error decay is much faster and that only M = O (e-1/2) neurons are needed to reach an error e. However, this Fourier parsimony property is rarely satisfied in applications.
Apart from specific examples, there is no general theorem to explain the increase in approximation performance obtained with neural networks having more hidden layers, for the functions encountered in applications. This problem therefore remains open.