Download Adaptive Pitch-Shifting With Applications to Intonation Adjustment in a Cappella Recordings A central challenge for a cappella singers is to adjust their intonation and to stay in tune relative to their fellow singers. During
editing of a cappella recordings, one may want to adjust local intonation of individual singers or account for global intonation drifts
over time. This requires applying a time-varying pitch-shift to the
audio recording, which we refer to as adaptive pitch-shifting. In
this context, existing (semi-)automatic approaches are either laborintensive or face technical and musical limitations. In this work,
we present automatic methods and tools for adaptive pitch-shifting
with applications to intonation adjustment in a cappella recordings. To this end, we show how to incorporate time-varying information into existing pitch-shifting algorithms that are based on
resampling and time-scale modification (TSM). Furthermore, we
release an open-source Python toolbox, which includes a variety
of TSM algorithms and an implementation of our method. Finally,
we show the potential of our tools by two case studies on global
and local intonation adjustment in a cappella recordings using a
publicly available multitrack dataset of amateur choral singing.
Download One-to-Many Conversion for Percussive Samples A filtering algorithm for generating subtle random variations in
sampled sounds is proposed. Using only one recording for impact
sound effects or drum machine sounds results in unrealistic repetitiveness during consecutive playback. This paper studies spectral
variations in repeated knocking sounds and in three drum sounds:
a hihat, a snare, and a tomtom. The proposed method uses a short
pseudo-random velvet-noise filter and a low-shelf filter to produce
timbral variations targeted at appropriate spectral regions, yielding potentially an endless number of new realistic versions of a
single percussive sampled sound.
The realism of the resulting
processed sounds is studied in a listening test. The results show
that the sound quality obtained with the proposed algorithm is at
least as good as that of a previous method while using 77% fewer
computational operations. The algorithm is widely applicable to
computer-generated music and game audio.
Download Higher-Order Anti-Derivatives of Band Limited Step Functions for the Design of Radial Filters in Spherical Harmonics Expansions This paper presents a discrete-time model of the spherical harmonics expansion describing a sound field. The so-called radial functions are realized as digital filters, which characterize the spatial
impulse responses of the individual harmonic orders. The filter
coefficients are derived from the analytical expressions of the timedomain radial functions, which have a finite extent in time. Due
to the varying degrees of discontinuities occurring at their edges, a
time-domain sampling of the radial functions gives rise to aliasing.
In order to reduce the aliasing distortion, the discontinuities are replaced with the higher-order anti-derivatives of a band-limited step
function. The improved spectral accuracy is demonstrated by numerical evaluation. The proposed discrete-time sound field model
is applicable in broadband applications such as spatial sound reproduction and active noise control.
Download Parametric Spatial Audio Effects Based on the Multi-Directional Decomposition of Ambisonic Sound Scenes Decomposing a sound-field into its individual components and respective parameters can represent a convenient first-step towards
offering the user an intuitive means of controlling spatial audio
effects and sound-field modification tools. The majority of such
tools available today, however, are instead limited to linear combinations of signals or employ a basic single-source parametric
model. Therefore, the purpose of this paper is to present a parametric framework, which seeks to overcome these limitations by first
dividing the sound-field into its multi-source and ambient components based on estimated spatial parameters. It is then demonstrated that by manipulating the spatial parameters prior to reproducing the scene, a number of sound-field modification and spatial
audio effects may be realised; including: directional warping, listener translation, sound source tracking, spatial editing workflows
and spatial side-chaining. Many of the effects described have also
been implemented as real-time audio plug-ins, in order to demonstrate how a user may interact with such tools in practice.
Download The Role of Modal Excitation in Colorless Reverberation A perceptual study revealing a novel connection between modal
properties of feedback delay networks (FDNs) and colorless reverberation is presented. The coloration of the reverberation tail
is quantified by the modal excitation distribution derived from the
modal decomposition of the FDN. A homogeneously decaying allpass FDN is designed to be colorless such that the corresponding narrow modal excitation distribution leads to a high perceived
modal density. Synthetic modal excitation distributions are generated to match modal excitations of FDNs. Three listening tests
were conducted to demonstrate the correlation between the modal
excitation distribution and the perceived degree of coloration. A
fourth test shows a significant reduction of coloration by the colorless FDN compared to other FDN designs. The novel connection of modal excitation, allpass FDNs, and perceived coloration
presents a beneficial design criterion for colorless artificial reverberation.
Download A Generative Model for Raw Audio Using Transformer Architectures This paper proposes a novel way of doing audio synthesis at the
waveform level using Transformer architectures. We propose a
deep neural network for generating waveforms, similar to wavenet . This is fully probabilistic, auto-regressive, and causal, i.e.
each sample generated depends on only the previously observed
samples. Our approach outperforms a widely used wavenet architecture by up to 9% on a similar dataset for predicting the next
step. Using the attention mechanism, we enable the architecture
to learn which audio samples are important for the prediction of
the future sample. We show how causal transformer generative
models can be used for raw waveform synthesis. We also show
that this performance can be improved by another 2% by conditioning samples over a wider context. The flexibility of the current
model to synthesize audio from latent representations suggests a
large number of potential applications. The novel approach of using generative transformer architectures for raw audio synthesis
is, however, still far away from generating any meaningful music
similar to wavenet, without using latent codes/meta-data to aid the
generation process.
Download Conformal Maps for the Discretization of Analog Filters Near the Nyquist Limit We propose a new analog filter discretization method that is useful
for discretizing systems with features near or above the Nyquist
limit. A conformal mapping approach is taken, and we introduce
the peaking conformal map and shelving conformal map. The proposed method provides a close match to the original analog frequency response below half the sampling rate and is parameterizable, order preserving, and agnostic to the original filter’s order
or type. The proposed method should have applications to discretizing filters that have time-varying parameters or need to be
implemented across many different sampling rates.
Download On the Equivalence of Integrator- and Differentiator-Based Continuous- and Discrete-Time Systems The article performs a generic comparison of integrator- and differentiator based continuous-time systems as well as their discretetime models, aiming to answer the reoccurring question in the
music DSP community of whether there are any benefits in using differentiators instead of conventionally employed integrators.
It is found that both kinds of models are practically equivalent, but
there are certain reservations about differentiator based models.
Download Bio-Inspired Optimization of Parametric Onset Detectors Onset detectors are used to recognize the beginning of musical
events in audio signals. Manual parameter tuning for onset detectors is a time consuming task, while existing automated approaches often maximize only a single performance metric. These
automated approaches cannot be used to optimize detector algorithms for complex scenarios, such as real-time onset detection
where an optimization process must consider both detection accuracy and latency. For this reason, a flexible optimization algorithm
should account for more than one performance metric in a multiobjective manner. This paper presents a generalized procedure for
automated optimization of parametric onset detectors. Our procedure employs a bio-inspired evolutionary computation algorithm
to replace manual parameter tuning, followed by the computation
of the Pareto frontier for multi-objective optimization. The proposed approach was evaluated on all the onset detection methods
of the Aubio library, using a dataset of monophonic acoustic guitar
recordings. Results show that the proposed solution is effective in
reducing the human effort required in the optimization process: it
replaced more than two days of manual parameter tuning with 13
hours and 34 minutes of automated computation. Moreover, the
resulting performance was comparable to that obtained by manual
optimization.