Abstract
How can machines learn as effectively as humans and animals ? How could machines learn how the world works and acquire common sense ? How could machines learn to reason and plan ?
Current AI architectures, such as large-scale auto-regressive language models, are insufficient. I will propose a modular cognitive architecture that could be a path towards answering these questions. The centerpiece of the architecture is a predictive model of the world that enables the system to predict the consequences of its actions and plan a sequence of actions that optimize a set of goals. The objectives include safeguards that guarantee the system's controllability and safety. The world model uses a Hierarchical Joint Embedding Predictive Architecture (H-JEPA) trained by self-supervised learning. The JEPA architecture learns abstract representations of perceptions that are simultaneously maximal in terms of information and predictability.