Salle 5, Site Marcelin Berthelot
Open to all
-

Abstract

This lecture shows that the approximation of locally regular functions requires a number of examples that grows exponentially with the dimension of the data, the so-called "curse of high dimensionality". If the answer y associated with a datum x is unique, then it's a function y = f (x). Predicting y therefore comes down to approximating the function f (x) by choosing an approximation from a predefined class. The choice of class and the magnitude of the approximation error depend on the regularity of f.

The nearest-neighbor algorithm approximates f (x) by the value f (x'), where x' is the example closest to x. The approximation is therefore a piecewise constant function. A link is established between the approximation error, the regularity of f, the number of training examples and the dimension d of the data. The lecture focuses on the approximation of Lipchitz functions. Under this assumption of local regularity, for the error to be less than c with n examples , the data space must be able to be covered by n balls of radius c. We show that the number n of balls grows as c^{-d}. This exponential growth in d is the curse of high dimensionality. As soon as d is greater than 10, we typically never have enough examples to reach a small precision c. To be able to approximate high-dimensional f-functions precisely, they need to have a global regularity that is much stronger than Lipchitzian regularity.