Abstract
The universal approximation theorem for neural networks with a single hidden layer is an important first theoretical result. The first part of the lecture presents this theorem in the simplified case where the data x has d binary values and the value of f(x) is also binary. In this binary case, we show that the function f(x) can be exactly represented by a neural network with a single hidden layer of size2d, using the function sign(x) as a non-linearity.
The second part of the lecture deals with the general theorem of universal approximation, which shows that any continuous function f(x) can be approximated by a neural network with a single hidden layer. The uniform error (maximum on x) converges to 0 as the size of the hidden layer tends to infinity. This theorem is valid for neural networks implemented with point nonlinearities s(t) that are continuous but not polynomials.
In the lecture, the theorem is proved for a rectifier s(t) = max(t,0). This is demonstrated by first showing that any continuous function f(x) can be approximated as a linear combination of sines and cosines with a uniform error that tends to 0 as the number of terms tends to infinity. It is then shown that a sine and a cosine can be approximated by piecewise linear functions obtained with linear combinations of dilated and translated rectifiers, with an error that tends towards 0 as the number of terms tends towards infinity.