Universal single-layer hidden network approximation

Abstract

The universal approximation theorem for neural networks with a single hidden layer is an important first theoretical result. The first part of the lecture presents this theorem in the simplified case where the data x has d binary values and the value of f(x) is also binary. In this binary case, we show that the function f(x) can be exactly represented by a neural network with a single hidden layer of size^2d, using the function sign(x) as a non-linearity.

The second part of the lecture deals with the general theorem of universal approximation, which shows that any continuous function f(x) can be approximated by a neural network with a single hidden layer. The uniform error (maximum on x) converges to 0 as the size of the hidden layer tends to infinity. This theorem is valid for neural networks implemented with point nonlinearities s(t) that are continuous but not polynomials.

In the lecture, the theorem is proved for a rectifier s(t) = max(t,0). This is demonstrated by first showing that any continuous function f(x) can be approximated as a linear combination of sines and cosines with a uniform error that tends to 0 as the number of terms tends to infinity. It is then shown that a sine and a cosine can be approximated by piecewise linear functions obtained with linear combinations of dilated and translated rectifiers, with an error that tends towards 0 as the number of terms tends towards infinity.

Universal single-layer hidden network approximation

Abstract

Speaker(s)

Stéphane Mallat

Events

Introduction to deep neural networks

Introducing 7 data challenges 2019 (1)

Applications of deep neural networks

Introducing 7 data challenges 2019 (2)

Neural network approximations and regularity

Presentation of the 2018 challenge winners

The origins : cybernetics and the perceptron

Weakly supervised learning for visual recognition

Universal single-layer hidden network approximation

Natural language

Approximation error with a hidden layer and regularity

Automatic video analysis

Maximum likelihood network optimization

Deep reinforcement learning

Gradient descent and gradient backpropagation

Convergence of stochastic gradient descent

See also