Data science mapping

Abstract

The first lecture maps out the three main areas of data science: signal processing, data modeling and prediction. The lecture introduces the major issues at stake in each of these fields, as well as the mathematical and computational concepts they call upon.

In signal processing, we want to calculate an estimate of a signal x with d coefficients , based on measurements. The dimension d is typically greater than one million, whether it's a sound, an image or any other observation. The aim of inverse problems is to improve signal quality. A measuring instrument performs a transformation of the input signal and adds errors, i.e. noise. Inverting the transformation while reducing the noise requires the use of a priori information about the signal's properties. Signal compression is another application, the aim of which is to reduce the number of bits used to encode signals, in order to limit storage space or transmission time. Here again, the aim is to exploit a priori information on signal structure.

Modeling involves capturing the nature and variability of the data. This is done by estimating the distribution of the observed data. This distribution is characterized by a random pattern assumed to have a probability density. This is a function of the large number d of variables in each data item. The main difficulty arises from this large size. The construction of such models is necessary for optimizing signal processing algorithms, for statistical physics, or for the synthesis of new data. It is also useful for prediction.

A prediction calculates an estimate of the answer y to a question, from data x, which can include many variables. For example, y could be the name of an animal that appears in an image x, or a diagnosis estimated from medical data x. Supervised learning optimizes the parameterization of prediction algorithms, using numerous examples composed of data x for which the answer y is known.

Abstract

Documents and media

Speaker(s)

Stéphane Mallat

Events

Data science mapping

2018 challenges presentation (1)

Bias-Complexity trade-off

Challenges 2018 (2)

The curse of large dimensions

Dimensionality reduction and denoising

Fourier analysis, filtering and sampling

Image denoising in a few formulas

Transforms and wavelet bases

Tackling a machine learning competition : methodology and practic…

Bayesian and linear kernel learning

Kernel regression and convex optimization

Kernel classification and SVM

Federated learning for medical data

Gradient descent and neural networks

Stochastic and conditional gradients for neural networks

See also