Abstract
Among recent advances in machine learning, one of the most impressive is undoubtedly that of generative AI, which in particular makes it possible to create ever more realistic samples of sounds, images and videos from a finite set of examples. At the heart of this revolution are diffusion models, which exploit the gradient of log-probability in stochastic differential equations to generate new samples.
In this talk, we will briefly introduce diffusion models before analyzing the generation dynamics in a well-controlled high-dimensional case: the mixing of two Gaussians. Using methods derived from statistical physics, we will demonstrate analytically that the generation of new data by a score model based on the empirical law passes through various transitions. First, we will identify a "speciation" transition, during which the sample's fate is sealed and its class can no longer be modified. This speciation is then followed by a collapse (or memorization) transition, after which the trajectory is irrevocably drawn to one of the data points in the training set in order to reproduce it exactly.
The theoretical conclusions we establish in the case of a Gaussian mixture model will then be generalized to any distributions, and validated on realistic data sets.