Tackling a machine learning competition : methodology and practical examples

Abstract

Participating in a Machine Learning competition requires both advanced computer science skills and a mathematical and algorithmic understanding of Machine Learning models. This presentation explains the iterative process involved in achieving good results in a Machine Learning competition.

The proposed methodology breaks down into 5 phases, repeated until the end of the competition. It begins with a review of the state of the art on the subject, in terms of scientific publications and similar competitions. This is followed by an exploration of the data, to understand its structure and get an initial idea of which features have predictive power. The third phase builds a representation of the data that optimizes these features: this is what we call feature engineering. Having constructed a model evaluation procedure, involving k-fold validation for example, the next step is to create a battery of models, compare and combine them to obtain the best possible predictive model. A data scientist then hypothesizes new features that could provide a more relevant representation of the data, and integrates them, repeating this methodology to improve the results until the end of the competition.

Achieving excellent rankings in Machine Learning competitions therefore requires precise knowledge of the models in order to parameterize them in the best possible way and to be aware of their limits, but also creativity to build a representation of the data likely to contain as much relevant information as possible.

Tackling a machine learning competition : methodology and practical examples

Abstract

Documents and media

Speaker(s)

Pierre Courtiol

Events

Data science mapping

2018 challenges presentation (1)

Bias-Complexity trade-off

Challenges 2018 (2)

The curse of large dimensions

Dimensionality reduction and denoising

Fourier analysis, filtering and sampling

Image denoising in a few formulas

Transforms and wavelet bases

Tackling a machine learning competition : methodology and practic…

Bayesian and linear kernel learning

Kernel regression and convex optimization

Kernel classification and SVM

Federated learning for medical data

Gradient descent and neural networks

Stochastic and conditional gradients for neural networks

See also