In the video I am improvising with sound artist dedosmuertos, in La Villette, Paris. I am interacting in real time with the latent space representation of several models trained on raw audio using the RAVE neural network architecture (Caillon and Esling, 2021). To interact with the latent space, I am using a Gametrak controller. By means of Wekinator (Fiebrink and Cook, 2009), an application for supervised, interactive machine learning, I’m mapping the performance space to the latent space representation of the model.
MOTIVATION
I am very much interested in new ways of synthesizing and interacting with sound. Recent advances in neural audio synthesis systems, such as OpenAI’s Jukebox, Google Magenta’s DDSP, and IRCAM’s RAVE offer novel ways of generating sound in raw audio form from a model learned from audio directly.
Among those three architectures, DDSP and RAVE can be steered in real time at inference time, enabling a much wanted control of the sound generation processs.
DIRECT CONTROL OF THE LATENT SPACE
DDSP can be conditioned on pitch and level. RAVE can be conditioned on audio signal content (spectral characteristics and level). DDSP was designed to model monophonic sounds. RAVE was conceived to model full audio texture.
SOUND CORPORA AND NEURAL AUDIO MODELS
I have been training RAVE in several full audio texture corpora.
CONTROLLING GENERATIVE MODELS WITH INTERACTIVE MACHINE LEARNING
An interactive machine learning environment (such as Wekinator), allows to easily map the low-dimensional peformance space to the highly dimensional neural audio model space.
THOUGHTS