Cluster Trees, near Neighbor Graphs, and Continuum Percolation

Résumé

What information does the clustering of a finite data set reveal about the underlying distribution from which the data were sampled? This basic question has proved elusive even for the most widely-used clustering procedures. One natural criterion is to seek clusters that converge (as the data set grows) to regions of high density. When all possible density levels are considered, this is a hierarchical clustering problem where the sought limit is called the "cluster tree". We give a simple algorithm for estimating this tree that implicitly constructs a multiscale hierarchy of near-neighbor graphs on the data points. We show that the procedure is consistent, answering an open problem of Hartigan. We also obtain rates of convergence, using a percolation argument that gives insight into how near-neighbor graphs should be constructed.

Documents et médias

Intervenant(s)

Sanjoy Dasgupta

UCSD

Voir aussi

Jean-Daniel Boissonnat, chaire Informatique et sciences numériques

Geometry Understanding in Higher Dimensions

Événements

Cluster Trees, near Neighbor Graphs, and Continuum Percolation

Résumé

Documents et médias

Intervenant(s)

Sanjoy Dasgupta

Voir aussi

Événements

Événement Précédent dans le cycle

Welcome

Événement Suivant dans le cycle

A Statistical Approach to Topological Data Analysis