DAFx Paper Archive - Search for machine learning in papers from2021, page 1 of 2

Exposure Bias and State Matching in Recurrent Neural Network Virtual Analog Models

Aleksi Peussa; Eero-Pekka Damskägg; Thomas Sherson; Stylianos I. Mimilakis; Lauri Juvela; Athanasios Gotsopoulos; Vesa Välimäki

DAFx-2021 - Vienna (virtual)

Virtual analog (VA) modeling using neural networks (NNs) has great potential for rapidly producing high-fidelity models. Recurrent neural networks (RNNs) are especially appealing for VA due to their connection with discrete nodal analysis. Furthermore, VA models based on NNs can be trained efficiently by directly exposing them to the circuit states in a gray-box fashion. However, exposure to ground truth information during training can leave the models susceptible to error accumulation in a free-running mode, also known as “exposure bias” in machine learning literature. This paper presents a unified framework for treating the previously proposed state trajectory network (STN) and gated recurrent unit (GRU) networks as special cases of discrete nodal analysis. We propose a novel circuit state-matching mechanism for the GRU and experimentally compare the previously mentioned networks for their performance in state matching, during training, and in exposure bias, during inference. Experimental results from modeling a diode clipper show that all the tested models exhibit some exposure bias, which can be mitigated by truncated backpropagation through time. Furthermore, the proposed state matching mechanism improves the GRU modeling performance of an overdrive pedal and a phaser pedal, especially in the presence of external modulation, apparent in a phaser circuit.

Download

A Generative Model for Raw Audio Using Transformer Architectures

Prateek Verma; Chris Chafe

DAFx-2021 - Vienna (virtual)

This paper proposes a novel way of doing audio synthesis at the waveform level using Transformer architectures. We propose a deep neural network for generating waveforms, similar to wavenet . This is fully probabilistic, auto-regressive, and causal, i.e. each sample generated depends on only the previously observed samples. Our approach outperforms a widely used wavenet architecture by up to 9% on a similar dataset for predicting the next step. Using the attention mechanism, we enable the architecture to learn which audio samples are important for the prediction of the future sample. We show how causal transformer generative models can be used for raw waveform synthesis. We also show that this performance can be improved by another 2% by conditioning samples over a wider context. The flexibility of the current model to synthesize audio from latent representations suggests a large number of potential applications. The novel approach of using generative transformer architectures for raw audio synthesis is, however, still far away from generating any meaningful music similar to wavenet, without using latent codes/meta-data to aid the generation process.

Download

One Billion Audio Sounds From Gpu-Enabled Modular Synthesis

Joseph Turian; Jordie Shier; George Tzanetakis; Kirk McNally; Max Henry

DAFx-2021 - Vienna (virtual)

We release synth1B1, a multi-modal audio corpus consisting of 1 billion 4-second synthesized sounds, paired with the synthesis parameters used to generate them. The dataset is 100x larger than any audio dataset in the literature. We also introduce torchsynth, an open source modular synthesizer that generates the synth1B1 samples on-the-fly at 16200x faster than real-time (714MHz) on a single GPU. Finally, we release two new audio datasets: FM synth timbre and subtractive synth pitch. Using these datasets, we demonstrate new rank-based evaluation criteria for existing audio representations. Finally, we propose a novel approach to synthesizer hyperparameter optimization.

Download

Differentiable White-Box Virtual Analog Modeling

Fabián Esqueda; Boris Kuznetsov; Julian D. Parker

DAFx-2021 - Vienna (virtual)

Component-wise circuit modeling, also known as “white-box” modeling, is a well established and much discussed technique in virtual analog modeling. This approach is generally limited in accuracy by lack of access to the exact component values present in a real example of the circuit. In this paper we show how this problem can be addressed by implementing the white-box model in a differentiable form, and allowing approximate component values to be learned from raw input–output audio measured from a real device.

Download

One-to-Many Conversion for Percussive Samples

Jon Fagerström; Sebastian J. Schlecht; Vesa Välimäki

DAFx-2021 - Vienna (virtual)

A filtering algorithm for generating subtle random variations in sampled sounds is proposed. Using only one recording for impact sound effects or drum machine sounds results in unrealistic repetitiveness during consecutive playback. This paper studies spectral variations in repeated knocking sounds and in three drum sounds: a hihat, a snare, and a tomtom. The proposed method uses a short pseudo-random velvet-noise filter and a low-shelf filter to produce timbral variations targeted at appropriate spectral regions, yielding potentially an endless number of new realistic versions of a single percussive sampled sound. The realism of the resulting processed sounds is studied in a listening test. The results show that the sound quality obtained with the proposed algorithm is at least as good as that of a previous method while using 77% fewer computational operations. The algorithm is widely applicable to computer-generated music and game audio.

Download

Amp-Space: A Large-Scale Dataset for Fine-Grained Timbre Transformation

Jason Naradowsky

DAFx-2021 - Vienna (virtual)

We release Amp-Space, a large-scale dataset of paired audio samples: a source audio signal, and an output signal, the result of a timbre transformation. The types of transformations we study are from blackbox musical tools (amplifiers, stompboxes, studio effects) traditionally used to shape the sound of guitar, bass, or synthesizer sounds. For each sample of transformed audio, the set of parameters used to create it are given. Samples are from both real and simulated devices, the latter allowing for orders of magnitude greater data than found in comparable datasets. We demonstrate potential use cases of this data by (a) pre-training a conditional WaveNet model on synthetic data and show that it reduces the number of samples necessary to digitally reproduce a real musical device, and (b) training a variational autoencoder to shape a continuous space of timbre transformations for creating new sounds through interpolation.

Download

Improving Synthesizer Programming From Variational Autoencoders Latent Space

Gwendal Le Vaillant; Thierry Dutoit; Sébastien Dekeyser

DAFx-2021 - Vienna (virtual)

Deep neural networks have been recently applied to the task of automatic synthesizer programming, i.e., finding optimal values of sound synthesis parameters in order to reproduce a given input sound. This paper focuses on generative models, which can infer parameters as well as generate new sets of parameters or perform smooth morphing effects between sounds. We introduce new models to ensure scalability and to increase performance by using heterogeneous representations of parameters as numerical and categorical random variables. Moreover, a spectral variational autoencoder architecture with multi-channel input is proposed in order to improve inference of parameters related to the pitch and intensity of input sounds. Model performance was evaluated according to several criteria such as parameters estimation error and audio reconstruction accuracy. Training and evaluation were performed using a 30k presets dataset which is published with this paper. They demonstrate significant improvements in terms of parameter inference and audio accuracy and show that presented models can be used with subsets or full sets of synthesizer parameters.

Download

Transition-Aware: A More Robust Approach for Piano Transcription

Xianke Wang; Wei Xu; Juanting Liu; Weiming Yang; Wenqing Cheng

DAFx-2021 - Vienna (virtual)

Piano transcription is a classic problem in music information retrieval. More and more transcription methods based on deep learning have been proposed in recent years. In 2019, Google Brain published a larger piano transcription dataset, MAESTRO. On this dataset, Onsets and Frames transcription approach proposed by Hawthorne achieved a stunning onset F1 score of 94.73%. Unlike the annotation method of Onsets and Frames, Transition-aware model presented in this paper annotates the attack process of piano signals called atack transition in multiple frames, instead of only marking the onset frame. In this way, the piano signals around onset time are taken into account, enabling the detection of piano onset more stable and robust. Transition-aware achieves a higher transcription F1 score than Onsets and Frames on MAESTRO dataset and MAPS dataset, reducing many extra note detection errors. This indicates that Transition-aware approach has better generalization ability on different datasets.

Download

Identification of Nonlinear Circuits as Port-Hamiltonian Systems

Judy Najnudel; Rémy Müller; Thomas Hélie; David Roze

DAFx-2021 - Vienna (virtual)

This paper addresses identification of nonlinear circuits for power-balanced virtual analog modeling and simulation. The proposed method combines a port-Hamiltonian system formulation with kernel-based methods to retrieve model laws from measurements. This combination allows for the estimated model to retain physical properties that are crucial for the accuracy of simulations, while representing a variety of nonlinear behaviors. As an illustration, the method is used to identify a nonlinear passive peaking EQ.

Download

Alloy Sounds: Non-Repeating Sound Textures With Probabilistic Cellular Automata

Tiago Fernandes Tavares; Thales Roel P. Pessanha; Gustavo Nishihara; Guilherme Zanchetta L. Avila

DAFx-2021 - Vienna (virtual)

Contemporary musicians commonly face the challenge of finding new, characteristic sounds that can make their compositions more distinct. They often resort to computers and algorithms, which can significantly aid in creative processes by generating unexpected material in controlled probabilistic processes. In particular, algorithms that present emergent behaviors, like genetic algorithms and cellular automata, have fostered a broad diversity of musical explorations. This article proposes an original technique for the computer-assisted creation and manipulation of sound textures. The technique uses Probabilistic Cellular Automata, which are yet seldom explored in the music domain, to blend two audio tracks into a third, different one. The proposed blending process works by dividing the source tracks into frequency bands and then associating each of the automaton’s cell to a frequency band. Only one source, chosen by the cell’s state, is active within each band. The resulting track has a non-repeating textural pattern that follows the changes in the Cellular Automata. This blending process allows the musician to choose the original material and the blend granularity, significantly changing the resulting blends. We demonstrate how to use the proposed blending process in sound design and its application in experimental and popular music.

Download

Proceedings of the International Conference on Digital Audio Effects (DAFx)

Proc. Int. Conf. Digital Audio Effects (DAFx)

Paper Archive

Years

Authors