Amphithéâtre Marguerite de Navarre, Site Marcelin Berthelot
Open to all
-

Abstract

As in many other fields, deep neural networks have enabled major advances in the processing of musical audio signals. This seminar presents the specificities of these signals and the adaptations required of deep neural networks for their modeling.

In a first part, we recall some elements of audio signal processing (Fourier, CQT, harmonic sinusoidal model, source-filter model). In the traditional machine-learning approach, these elements are used to build " hand-crafted features " given as input to classification algorithms.

In a second part, we show how deep neural networks (in particular convolutional neural networks) can be used to perform " feature learning ". We first recall the fundamental differences between the 2D image and time/frequency representations. We then discuss the choice of input (spectrogram, CQT or raw-waveform), the choice of convolution filter shape, autoregressive neural models, and the different ways of injecting a priori knowledge (harmonicity, source/filter) into these networks.

In a third part, we present the different learning paradigms used in the music audio domain : classification, encoder-decoder (source separation, constraints on latent space), metric learning (triplet loss) and semi-supervised learning.

Speaker(s)

Geoffroy Peeters

Professor at Télecom ParisTech