Salle 5, Site Marcelin Berthelot
Open to all

Data challenges

The Challenge data website provides data processing challenges using supervised learning. This seminar introduces some of the challenges used in the lecture. These challenges are proposed by companies or scientists, and are based on concrete problems they encounter in their work. They are part of a spirit of scientific exchange, with the sharing of data and algorithms. The data made available is non-confidential, and participants' algorithmic reports can be made available to all, if they so wish, after the close of the season.

The challenges cover a wide spectrum of applications, including images, sounds, texts, medical data, physical measurements and Internet data. Each challenge provides labelled data, as well as test data. Participants submit their predictions calculated on the test data to the website. The site calculates a score with a specified error metric. It provides a ranking for participants, enabling their results to be assessed in a wider community. Challenges begin on January1, 2018. An intermediate closing takes place in June with an evaluation of predictions on new test data. The final closing is in December, with a prize-giving ceremony in January 2019.

Challenges 2018

This year's challenges were organized and supervised at ENS by Mathieu Andreux, Tomas Anglès, Georgios Exarcharkis, Louis Thiry, John Zarka and Sixin Zhang. The organization of these data challenges is supported by the CFM Chair at the École normale supérieure, and by the Fondation des sciences mathématiques de Paris.

During this first session, the following six challenges were presented:

  • " Predicting volatility on financial markets ", presented by Éric Lebigot of Capital Fund Management. The aim of the challenge is to predict the end-of-day volatility of US equities based on their historical beginning-of-day returns.
  • " Celebrity identification ", presented by Antoine Chassang from Reminiz. The aim is to identify faces appearing in videos from a reference dictionary of celebrity faces.
  • " Prediction of hourly electricity production by production unit in France ", presented by Alexi Bergès from Wattstrat. The aim is to predict the hourly electricity production of each production unit in France, based on regional renewable demand and production curves, as well as the hourly availability of production units.
  • " Prediction of complaints during e-commerce transactions ", presented by Vincent Michel of PriceMinister - Rakuten France. The aim is to predict whether or not an e-commerce transaction will give rise to a claim, and if so, of what type, based on the characteristics of the transaction.
  • " Prediction of expected response to pharmaceutical questions ", presented by Emmanuel Bilbault from Posos. The aim is to categorize pharmaceutical questions according to the type of response expected.
  • " Predicting the energy performance of buildings ", presented by Sylvain Le Corff from Oze-Energies. The aim is to predict energy consumption and indoor building temperatures based on outdoor temperatures and a reduced number of parameters describing the building's structure and settings.