Amphithéâtre Marguerite de Navarre, Site Marcelin Berthelot
Open to all
-

Abstract

This lecture reviews the ideas behind neural networks, starting with the theory of cybernetics initiated by Wiener, the importance of hierarchical structures, and Rosenblatt's perceptron. Cybernetics provides a perspective on dynamic systems. Intelligence is defined as the ability to adapt over time. This adaptation optimizes a trajectory to reach a goal. In cybernetics, adaptation takes place through a feedback loop that adapts control parameters to reduce a measure of error relative to the goal to be achieved. Unlike an open-loop system, it is not necessary to model the environment, but only to react to the disturbances it introduces on the trajectory to reach the goal. Gradient descent learning algorithms for neural networks follow this principle. They progressively optimize the weights of the network in order to reduce the prediction error.

The article " The architecture of complexity " by H. Simons in 1962 shows that the existence of hierarchical structures is another element that simplifies the analysis and control of dynamic systems. These hierarchies can be found in most systems in the sciences, humanities and symbolic systems. They are also found in the architecture of convolutional deep neural networks.

Rosemblatt's perceptron, introduced in 1957, defines the first learning algorithm on a neural network. It has a single layer and binary output to classify data into two possible classes. Learning takes place via a gradient descent that minimizes an average of the deviations from the decision frontier. We show that this gradient descent follows Hebb's rule, observed in biology. Hebb's rule observes that two neurons that are excited simultaneously will strengthen the link between them. We also demonstrate that Rosemblatt's algorithm converges to a solution that depends on the initial conditions if the training data are linearly separable, and does not converge if they are not separable.

To avoid these convergence problems, the cost function optimized by the perceptron must be regularized. Vapnik's support vector machines introduce a margin criterion which guarantees that the boundary separates the points of two different classes at best, which implies the uniqueness of the convergence point and eliminates non-convergence in the case of non-separable data.