For copyright reasons, this video is not available on our website. However, you can view it on our YouTube channel by clicking on this link:
Abstract
Current successes in visual recognition are largely due to the learning of new image representations, thanks to supervised learning techniques and the existence of large annotated image databases.
This presentation explains that, in order to develop algorithms capable of understanding evolutions in the visual world around us, the main difficulty is now to develop visual representations capable of generalizing to environments different from those appearing in the training database. They also need to be able to learn under weak supervision, with noisy and partially annotated data. A number of factors point in this direction. These include the existence of multimodal data, enabling visual, auditory or textual information to be cross-referenced without annotation, and the use of physical models learned from data. This talk presents research directions that address these problems, with applications to understanding video content or finding visual correspondences.