Abstract

The oral modality is the most natural channel for linguistic interaction, but current language technologies (NLP) are mainly based on the written word, requiring large quantities of text to develop language models. Even voice assistants or speech translation systems use text as an intermediary, which is inefficient and limits the technology to languages with significant textual resources. What's more, it neglects speech characteristics such as rhythm and intonation. Yet children learn their mother tongue(s) long before they learn to read or write.

In this presentation, we will discuss recent advances in learning audio representations that pave the way for NLP applications directly from speech without any text. These models can capture the nuances of spoken language, including dialogues. We will also discuss the technical challenges still to be overcome in order to reproduce learning that would approach that of the human baby.

Emmanuel Dupoux

Emmanuel Dupoux is a professor at the École des hautes études en sciences sociales (EHESS) and a researcher at Meta AI Labs. He heads the Cognitive Machine Learning team at the École normale supérieure (ENS). He holds a PhD in Cognitive Science (EHESS), a Master's degree in Computer Science (Université d'Orsay) and a degree in Telecommunications Engineering (Telecom Paris). His research combines developmental science, cognitive neuroscience and machine learning, with a focus on reverse engineering infant language and cognitive development using unsupervised or weakly supervised learning. He is the recipient of an ERC Advanced Fellowship, and has organized a series of international competitions in human-inspired machine learning (Zero Resource Speech Challenge, 2015-2021; Intphys). He is a member of the CIFAR LMB program, holds a PRAIRIE chair and is an ELLIS Fellow. He is the author of 150 articles in peer-reviewed journals in cognitive science and language technology.

Learning a language model from audio

Abstract

Emmanuel Dupoux

Speaker(s)

Emmanuel Dupoux

Events

Representing text units

Some examples of NLP applications in the digital humanities

Symbolic and probabilistic approaches

Two examples of the use of transducers in linguistics

Language templates

Learning a language model from audio

Automatic translation

Massively multilingual neural translation

Neural approaches to some application tasks

Knowledge-based text generation

Computational linguistics

Automatic analysis of argumentation in political debates

Conversing with the machine

Prediction is understanding : a neuro-cognitive model of language based on prediction

Multimodalities : NLP and images, NLP and speech

Goal-oriented AI : towards machines capable of learning, reasoning and planning

See also