Download A Generative Model for Raw Audio Using Transformer Architectures This paper proposes a novel way of doing audio synthesis at the
waveform level using Transformer architectures. We propose a
deep neural network for generating waveforms, similar to wavenet . This is fully probabilistic, auto-regressive, and causal, i.e.
each sample generated depends on only the previously observed
samples. Our approach outperforms a widely used wavenet architecture by up to 9% on a similar dataset for predicting the next
step. Using the attention mechanism, we enable the architecture
to learn which audio samples are important for the prediction of
the future sample. We show how causal transformer generative
models can be used for raw waveform synthesis. We also show
that this performance can be improved by another 2% by conditioning samples over a wider context. The flexibility of the current
model to synthesize audio from latent representations suggests a
large number of potential applications. The novel approach of using generative transformer architectures for raw audio synthesis
is, however, still far away from generating any meaningful music
similar to wavenet, without using latent codes/meta-data to aid the
generation process.
Download Non-Iterative Schemes for the Simulation of Nonlinear Audio Circuits In this work, a number of numerical schemes are presented in the
context of virtual-analog simulation. The schemes are linearlyimplicit in character, and hence directly solvable without iterative
methods. Schemes of increasing order of accuracy are constructed,
and convergence and stability conditions are proven formally. The
schemes are able to handle stiff problems very efficiently, because
of their fast update, and can be run at higher sample rates to reduce
aliasing. The cases of the diode clipper and ring modulator are
investigated in detail, including several numerical examples.
Download Real-Time Implementation of a Friction Drum Inspired Instrument Using Finite Difference Schemes Physical modelling sound synthesis is a powerful method for constructing virtual instruments aiming to mimic the sound of realworld counterparts, while allowing for the possibility of engaging
with these instruments in ways which may be impossible in person.
Such a case is explored in this paper: particularly the simulation
of a friction drum inspired instrument. It is an instrument played
by causing the membrane of a drum head to vibrate via friction.
This involves rubbing the membrane via a stick or a cord attached
to its center, with the induced vibrations being transferred to the
air inside a sound box.
This paper describes the development of a real-time audio application which models such an instrument as a bowed membrane
connected to an acoustic tube. This is done by means of a numerical simulation using finite-difference time-domain (FDTD) methods in which the excitation, whose position is free to change in
real-time, is modelled by a highly non-linear elasto-plastic friction
model. Additionally, the virtual instrument allows for dynamically
modifying physical parameters of the model, thereby allowing the
user to generate new and interesting sounds that go beyond a realworld friction drum.
Download Applications of Port Hamiltonian Methods to Non-Iterative Stable Simulations of the Korg35 and Moog 4-Pole Vcf This paper presents an application of the port Hamiltonian formalism to the nonlinear simulation of the OTA-based Korg35 filter circuit and the Moog 4-pole ladder filter circuit. Lyapunov analysis is
used with their state-space representations to guarantee zero-input
stability over the range of parameters consistent with the actual
circuits. A zero-input stable non-iterative discrete-time scheme
based on a discrete gradient and a change of state variables is
shown along with numerical simulations. Simulations show behavior consistent with the actual operation of the circuits, e.g.,
self-oscillation, and are found to be stable and have lower computational cost compared to iterative methods.
Download Adaptive Pitch-Shifting With Applications to Intonation Adjustment in a Cappella Recordings A central challenge for a cappella singers is to adjust their intonation and to stay in tune relative to their fellow singers. During
editing of a cappella recordings, one may want to adjust local intonation of individual singers or account for global intonation drifts
over time. This requires applying a time-varying pitch-shift to the
audio recording, which we refer to as adaptive pitch-shifting. In
this context, existing (semi-)automatic approaches are either laborintensive or face technical and musical limitations. In this work,
we present automatic methods and tools for adaptive pitch-shifting
with applications to intonation adjustment in a cappella recordings. To this end, we show how to incorporate time-varying information into existing pitch-shifting algorithms that are based on
resampling and time-scale modification (TSM). Furthermore, we
release an open-source Python toolbox, which includes a variety
of TSM algorithms and an implementation of our method. Finally,
we show the potential of our tools by two case studies on global
and local intonation adjustment in a cappella recordings using a
publicly available multitrack dataset of amateur choral singing.
Download One-to-Many Conversion for Percussive Samples A filtering algorithm for generating subtle random variations in
sampled sounds is proposed. Using only one recording for impact
sound effects or drum machine sounds results in unrealistic repetitiveness during consecutive playback. This paper studies spectral
variations in repeated knocking sounds and in three drum sounds:
a hihat, a snare, and a tomtom. The proposed method uses a short
pseudo-random velvet-noise filter and a low-shelf filter to produce
timbral variations targeted at appropriate spectral regions, yielding potentially an endless number of new realistic versions of a
single percussive sampled sound.
The realism of the resulting
processed sounds is studied in a listening test. The results show
that the sound quality obtained with the proposed algorithm is at
least as good as that of a previous method while using 77% fewer
computational operations. The algorithm is widely applicable to
computer-generated music and game audio.
Download Spherical Decomposition of Arbitrary Scattering Geometries for Virtual Acoustic Environments A method is proposed to encode the acoustic scattering of objects for virtual acoustic applications through a multiple-input and
multiple-output framework. The scattering is encoded as a matrix in the spherical harmonic domain, and can be re-used and
manipulated (rotated, scaled and translated) to synthesize various
sound scenes. The proposed method is applied and validated using
Boundary Element Method simulations which shows accurate results between references and synthesis. The method is compatible
with existing frameworks such as Ambisonics and image source
methods.
Download Combining Zeroth and First-Order Analysis With Lagrange Polynomials to Reduce Artefacts in Live Concatenative Granulation This paper presents a technique addressing signal discontinuity and concatenation artefacts in real-time granular processing
with rectangular windowing. By combining zero-crossing synchronicity, first-order derivative analysis, and Lagrange polynomials, we can generate streams of uncorrelated and non-overlapping
sonic fragments with minimal low-order derivatives discontinuities. The resulting open-source algorithm, implemented in the
Faust language, provides a versatile real-time software for dynamical looping, wavetable oscillation, and granulation with reduced artefacts due to rectangular windowing and no artefacts
from overlap-add-to-one techniques commonly deployed in granular processing.
Download A Physical Model of the Trombone Using Dynamic Grids for Finite-Difference Schemes In this paper, a complete simulation of a trombone using finitedifference time-domain (FDTD) methods is proposed. In particular, we propose the use of a novel method to dynamically vary the
number of grid points associated to the FDTD method, to simulate
the fact that the physical dimension of the trombone’s resonator
dynamically varies over time. We describe the different elements
of the model and present the results of a real-time simulation.
Download Bio-Inspired Optimization of Parametric Onset Detectors Onset detectors are used to recognize the beginning of musical
events in audio signals. Manual parameter tuning for onset detectors is a time consuming task, while existing automated approaches often maximize only a single performance metric. These
automated approaches cannot be used to optimize detector algorithms for complex scenarios, such as real-time onset detection
where an optimization process must consider both detection accuracy and latency. For this reason, a flexible optimization algorithm
should account for more than one performance metric in a multiobjective manner. This paper presents a generalized procedure for
automated optimization of parametric onset detectors. Our procedure employs a bio-inspired evolutionary computation algorithm
to replace manual parameter tuning, followed by the computation
of the Pareto frontier for multi-objective optimization. The proposed approach was evaluated on all the onset detection methods
of the Aubio library, using a dataset of monophonic acoustic guitar
recordings. Results show that the proposed solution is effective in
reducing the human effort required in the optimization process: it
replaced more than two days of manual parameter tuning with 13
hours and 34 minutes of automated computation. Moreover, the
resulting performance was comparable to that obtained by manual
optimization.