Data challenges

Challenges 2023

Challenges are proposed by public services, companies or scientific laboratories, and are based on real-life problems. Participants submit the results of their classification or prediction algorithms, which are then put into competition via the website. The challenges are integrated into Prof. Stéphane Mallat's course at the Collège de France, and offered in many data science courses in France and the French-speaking world.

The 2023 edition has now been launched with thirteen new challenges, organized in partnership with the École Normale Supérieure and the Institut Louis Bachelier.

Challenges

Learning radiological anatomy with few shots learning

Presented by Raidium
The aim of this challenge is to segment structures using their shape, but without exhaustive annotations.

Reinforcement learning for the carbon footprint of buildings

Presented by Accenta
The subject of this challenge is the reduction of greenhouse gas emissions caused by the heating and cooling of buildings. The aim is to minimize these emissions by optimizing the control of thermal systems in collective buildings.

Biosonar - Odontocete click detection

Presented by Université de Toulon
The aim of the challenge is to determine whether audio extracts contain biosonar (delphinid clicks) or transient noise (shrimp or reef noise...).

Colloids : how big can you get ?

Presented by ESPCI Paris
Can you detect small particles from a noisy 3D scan ?

How to unmask fraudsters 

Presented by BNP Paribas PR
The aim of this challenge is to find the best method of transforming and aggregating the customer basket data of one of our partners to detect cases of fraud.
by using this basket data, fraudsters can be detected and thus refused in the future.

What explains the price of electricity ?

Presented by QRT
The aim is to model the price of electricity using meteorological, energy (raw materials) and commercial data for two European countries - France and Germany. It should be stressed that this is a problem of explaining prices by other concomitant variables, and not a problem of prediction.

Detection of the PIK3CA mutation in breast cancer

Presented by OWKIN
The challenge proposed by Owkin is a weakly supervised binary classification problem : the aim is to predict, from a high-resolution digitized histological slide, whether a patient has a PIK3CA gene mutation.

Estimating missing values in ESG indicators

Presented by Pladifes and Impactfull
The aim of the challenge is to predict missing values for 15 corporate extra-financial indicators (up to 96 % missing values). These indicators are available over three years (2018, 2019, 2020) and come from companies' extra-financial reports.

The curse of the table of contents

Presented by Autorité des Marchés Financiers
The aim of the challenge is to be able to reconstruct the table of contents of the annual financial reports of listed French companies, based on the document's text blocks and their metadata (positions, font, text size, etc.).

Predicting end-of-session returns in the US equity market

Presented by Capital Fund Management (CFM)
The aim of this challenge is to estimate the direction of a stock's price during the last two hours of trading, knowing how it behaved at the start of the day.

Short-term precipitation forecasting

Presented by PlumeLabs
The aim of this challenge is to forecast future rainfall rates (estimated via radar echo measurements) using past rainfall rates.

Real-time prediction of train occupancy

Presented by SNCF-Transilien
The aim of this challenge for SNCF-Transilien is to explore the possibility of predicting the occupancy rate on board trains in real time in the short term. This would enable SNCF-Transilien to provide passengers with real-time information on on-board load via its digital media.

Robustness to distribution changes and ambiguity

Presented by EffiSciences
What happens if misleading correlations are present in the training dataset ?