Download Exposure Bias and State Matching in Recurrent Neural Network Virtual Analog Models Virtual analog (VA) modeling using neural networks (NNs) has
great potential for rapidly producing high-fidelity models. Recurrent neural networks (RNNs) are especially appealing for VA due
to their connection with discrete nodal analysis. Furthermore, VA
models based on NNs can be trained efficiently by directly exposing them to the circuit states in a gray-box fashion. However,
exposure to ground truth information during training can leave the
models susceptible to error accumulation in a free-running mode,
also known as “exposure bias” in machine learning literature. This
paper presents a unified framework for treating the previously
proposed state trajectory network (STN) and gated recurrent unit
(GRU) networks as special cases of discrete nodal analysis. We
propose a novel circuit state-matching mechanism for the GRU
and experimentally compare the previously mentioned networks
for their performance in state matching, during training, and in exposure bias, during inference. Experimental results from modeling
a diode clipper show that all the tested models exhibit some exposure bias, which can be mitigated by truncated backpropagation
through time. Furthermore, the proposed state matching mechanism improves the GRU modeling performance of an overdrive
pedal and a phaser pedal, especially in the presence of external
modulation, apparent in a phaser circuit.
Download A Generative Model for Raw Audio Using Transformer Architectures This paper proposes a novel way of doing audio synthesis at the
waveform level using Transformer architectures. We propose a
deep neural network for generating waveforms, similar to wavenet . This is fully probabilistic, auto-regressive, and causal, i.e.
each sample generated depends on only the previously observed
samples. Our approach outperforms a widely used wavenet architecture by up to 9% on a similar dataset for predicting the next
step. Using the attention mechanism, we enable the architecture
to learn which audio samples are important for the prediction of
the future sample. We show how causal transformer generative
models can be used for raw waveform synthesis. We also show
that this performance can be improved by another 2% by conditioning samples over a wider context. The flexibility of the current
model to synthesize audio from latent representations suggests a
large number of potential applications. The novel approach of using generative transformer architectures for raw audio synthesis
is, however, still far away from generating any meaningful music
similar to wavenet, without using latent codes/meta-data to aid the
generation process.
Download One Billion Audio Sounds From Gpu-Enabled Modular Synthesis We release synth1B1, a multi-modal audio corpus consisting of 1
billion 4-second synthesized sounds, paired with the synthesis parameters used to generate them. The dataset is 100x larger than
any audio dataset in the literature. We also introduce torchsynth,
an open source modular synthesizer that generates the synth1B1
samples on-the-fly at 16200x faster than real-time (714MHz) on
a single GPU. Finally, we release two new audio datasets: FM
synth timbre and subtractive synth pitch. Using these datasets, we
demonstrate new rank-based evaluation criteria for existing audio
representations. Finally, we propose a novel approach to synthesizer hyperparameter optimization.
Download Differentiable White-Box Virtual Analog Modeling Component-wise circuit modeling, also known as “white-box”
modeling, is a well established and much discussed technique in
virtual analog modeling. This approach is generally limited in accuracy by lack of access to the exact component values present in
a real example of the circuit. In this paper we show how this problem can be addressed by implementing the white-box model in a
differentiable form, and allowing approximate component values
to be learned from raw input–output audio measured from a real
device.
Download One-to-Many Conversion for Percussive Samples A filtering algorithm for generating subtle random variations in
sampled sounds is proposed. Using only one recording for impact
sound effects or drum machine sounds results in unrealistic repetitiveness during consecutive playback. This paper studies spectral
variations in repeated knocking sounds and in three drum sounds:
a hihat, a snare, and a tomtom. The proposed method uses a short
pseudo-random velvet-noise filter and a low-shelf filter to produce
timbral variations targeted at appropriate spectral regions, yielding potentially an endless number of new realistic versions of a
single percussive sampled sound.
The realism of the resulting
processed sounds is studied in a listening test. The results show
that the sound quality obtained with the proposed algorithm is at
least as good as that of a previous method while using 77% fewer
computational operations. The algorithm is widely applicable to
computer-generated music and game audio.
Download Amp-Space: A Large-Scale Dataset for Fine-Grained Timbre Transformation We release Amp-Space, a large-scale dataset of paired audio
samples: a source audio signal, and an output signal, the result of
a timbre transformation. The types of transformations we study
are from blackbox musical tools (amplifiers, stompboxes, studio
effects) traditionally used to shape the sound of guitar, bass, or
synthesizer sounds. For each sample of transformed audio, the
set of parameters used to create it are given. Samples are from
both real and simulated devices, the latter allowing for orders of
magnitude greater data than found in comparable datasets. We
demonstrate potential use cases of this data by (a) pre-training a
conditional WaveNet model on synthetic data and show that it reduces the number of samples necessary to digitally reproduce a
real musical device, and (b) training a variational autoencoder to
shape a continuous space of timbre transformations for creating
new sounds through interpolation.
Download Improving Synthesizer Programming From Variational Autoencoders Latent Space Deep neural networks have been recently applied to the task of
automatic synthesizer programming, i.e., finding optimal values
of sound synthesis parameters in order to reproduce a given input
sound. This paper focuses on generative models, which can infer
parameters as well as generate new sets of parameters or perform
smooth morphing effects between sounds.
We introduce new models to ensure scalability and to increase
performance by using heterogeneous representations of parameters as numerical and categorical random variables.
Moreover,
a spectral variational autoencoder architecture with multi-channel
input is proposed in order to improve inference of parameters related to the pitch and intensity of input sounds.
Model performance was evaluated according to several criteria
such as parameters estimation error and audio reconstruction accuracy. Training and evaluation were performed using a 30k presets
dataset which is published with this paper. They demonstrate significant improvements in terms of parameter inference and audio
accuracy and show that presented models can be used with subsets
or full sets of synthesizer parameters.
Download Transition-Aware: A More Robust Approach for Piano Transcription Piano transcription is a classic problem in music information retrieval. More and more transcription methods based on deep learning have been proposed in recent years. In 2019, Google Brain
published a larger piano transcription dataset, MAESTRO. On this
dataset, Onsets and Frames transcription approach proposed by
Hawthorne achieved a stunning onset F1 score of 94.73%. Unlike
the annotation method of Onsets and Frames, Transition-aware
model presented in this paper annotates the attack process of piano
signals called atack transition in multiple frames, instead of only
marking the onset frame. In this way, the piano signals around
onset time are taken into account, enabling the detection of piano onset more stable and robust. Transition-aware achieves a
higher transcription F1 score than Onsets and Frames on MAESTRO dataset and MAPS dataset, reducing many extra note detection errors. This indicates that Transition-aware approach has
better generalization ability on different datasets.
Download Identification of Nonlinear Circuits as Port-Hamiltonian Systems This paper addresses identification of nonlinear circuits for
power-balanced virtual analog modeling and simulation. The proposed method combines a port-Hamiltonian system formulation
with kernel-based methods to retrieve model laws from measurements. This combination allows for the estimated model to retain
physical properties that are crucial for the accuracy of simulations,
while representing a variety of nonlinear behaviors. As an illustration, the method is used to identify a nonlinear passive peaking
EQ.
Download Alloy Sounds: Non-Repeating Sound Textures With Probabilistic Cellular Automata Contemporary musicians commonly face the challenge of finding
new, characteristic sounds that can make their compositions more
distinct. They often resort to computers and algorithms, which can
significantly aid in creative processes by generating unexpected
material in controlled probabilistic processes. In particular, algorithms that present emergent behaviors, like genetic algorithms
and cellular automata, have fostered a broad diversity of musical explorations. This article proposes an original technique for
the computer-assisted creation and manipulation of sound textures.
The technique uses Probabilistic Cellular Automata, which are yet
seldom explored in the music domain, to blend two audio tracks
into a third, different one. The proposed blending process works
by dividing the source tracks into frequency bands and then associating each of the automaton’s cell to a frequency band. Only one
source, chosen by the cell’s state, is active within each band. The
resulting track has a non-repeating textural pattern that follows the
changes in the Cellular Automata. This blending process allows
the musician to choose the original material and the blend granularity, significantly changing the resulting blends. We demonstrate
how to use the proposed blending process in sound design and its
application in experimental and popular music.