Download Adaptive Pitch-Shifting With Applications to Intonation Adjustment in a Cappella Recordings
A central challenge for a cappella singers is to adjust their intonation and to stay in tune relative to their fellow singers. During editing of a cappella recordings, one may want to adjust local intonation of individual singers or account for global intonation drifts over time. This requires applying a time-varying pitch-shift to the audio recording, which we refer to as adaptive pitch-shifting. In this context, existing (semi-)automatic approaches are either laborintensive or face technical and musical limitations. In this work, we present automatic methods and tools for adaptive pitch-shifting with applications to intonation adjustment in a cappella recordings. To this end, we show how to incorporate time-varying information into existing pitch-shifting algorithms that are based on resampling and time-scale modification (TSM). Furthermore, we release an open-source Python toolbox, which includes a variety of TSM algorithms and an implementation of our method. Finally, we show the potential of our tools by two case studies on global and local intonation adjustment in a cappella recordings using a publicly available multitrack dataset of amateur choral singing.
Download One-to-Many Conversion for Percussive Samples
A filtering algorithm for generating subtle random variations in sampled sounds is proposed. Using only one recording for impact sound effects or drum machine sounds results in unrealistic repetitiveness during consecutive playback. This paper studies spectral variations in repeated knocking sounds and in three drum sounds: a hihat, a snare, and a tomtom. The proposed method uses a short pseudo-random velvet-noise filter and a low-shelf filter to produce timbral variations targeted at appropriate spectral regions, yielding potentially an endless number of new realistic versions of a single percussive sampled sound. The realism of the resulting processed sounds is studied in a listening test. The results show that the sound quality obtained with the proposed algorithm is at least as good as that of a previous method while using 77% fewer computational operations. The algorithm is widely applicable to computer-generated music and game audio.
Download Higher-Order Anti-Derivatives of Band Limited Step Functions for the Design of Radial Filters in Spherical Harmonics Expansions
This paper presents a discrete-time model of the spherical harmonics expansion describing a sound field. The so-called radial functions are realized as digital filters, which characterize the spatial impulse responses of the individual harmonic orders. The filter coefficients are derived from the analytical expressions of the timedomain radial functions, which have a finite extent in time. Due to the varying degrees of discontinuities occurring at their edges, a time-domain sampling of the radial functions gives rise to aliasing. In order to reduce the aliasing distortion, the discontinuities are replaced with the higher-order anti-derivatives of a band-limited step function. The improved spectral accuracy is demonstrated by numerical evaluation. The proposed discrete-time sound field model is applicable in broadband applications such as spatial sound reproduction and active noise control.
Download Parametric Spatial Audio Effects Based on the Multi-Directional Decomposition of Ambisonic Sound Scenes
Decomposing a sound-field into its individual components and respective parameters can represent a convenient first-step towards offering the user an intuitive means of controlling spatial audio effects and sound-field modification tools. The majority of such tools available today, however, are instead limited to linear combinations of signals or employ a basic single-source parametric model. Therefore, the purpose of this paper is to present a parametric framework, which seeks to overcome these limitations by first dividing the sound-field into its multi-source and ambient components based on estimated spatial parameters. It is then demonstrated that by manipulating the spatial parameters prior to reproducing the scene, a number of sound-field modification and spatial audio effects may be realised; including: directional warping, listener translation, sound source tracking, spatial editing workflows and spatial side-chaining. Many of the effects described have also been implemented as real-time audio plug-ins, in order to demonstrate how a user may interact with such tools in practice.
Download The Role of Modal Excitation in Colorless Reverberation
A perceptual study revealing a novel connection between modal properties of feedback delay networks (FDNs) and colorless reverberation is presented. The coloration of the reverberation tail is quantified by the modal excitation distribution derived from the modal decomposition of the FDN. A homogeneously decaying allpass FDN is designed to be colorless such that the corresponding narrow modal excitation distribution leads to a high perceived modal density. Synthetic modal excitation distributions are generated to match modal excitations of FDNs. Three listening tests were conducted to demonstrate the correlation between the modal excitation distribution and the perceived degree of coloration. A fourth test shows a significant reduction of coloration by the colorless FDN compared to other FDN designs. The novel connection of modal excitation, allpass FDNs, and perceived coloration presents a beneficial design criterion for colorless artificial reverberation.
Download A Generative Model for Raw Audio Using Transformer Architectures
This paper proposes a novel way of doing audio synthesis at the waveform level using Transformer architectures. We propose a deep neural network for generating waveforms, similar to wavenet . This is fully probabilistic, auto-regressive, and causal, i.e. each sample generated depends on only the previously observed samples. Our approach outperforms a widely used wavenet architecture by up to 9% on a similar dataset for predicting the next step. Using the attention mechanism, we enable the architecture to learn which audio samples are important for the prediction of the future sample. We show how causal transformer generative models can be used for raw waveform synthesis. We also show that this performance can be improved by another 2% by conditioning samples over a wider context. The flexibility of the current model to synthesize audio from latent representations suggests a large number of potential applications. The novel approach of using generative transformer architectures for raw audio synthesis is, however, still far away from generating any meaningful music similar to wavenet, without using latent codes/meta-data to aid the generation process.
Download Conformal Maps for the Discretization of Analog Filters Near the Nyquist Limit
We propose a new analog filter discretization method that is useful for discretizing systems with features near or above the Nyquist limit. A conformal mapping approach is taken, and we introduce the peaking conformal map and shelving conformal map. The proposed method provides a close match to the original analog frequency response below half the sampling rate and is parameterizable, order preserving, and agnostic to the original filter’s order or type. The proposed method should have applications to discretizing filters that have time-varying parameters or need to be implemented across many different sampling rates.
Download On the Equivalence of Integrator- and Differentiator-Based Continuous- and Discrete-Time Systems
The article performs a generic comparison of integrator- and differentiator based continuous-time systems as well as their discretetime models, aiming to answer the reoccurring question in the music DSP community of whether there are any benefits in using differentiators instead of conventionally employed integrators. It is found that both kinds of models are practically equivalent, but there are certain reservations about differentiator based models.
Download Bio-Inspired Optimization of Parametric Onset Detectors
Onset detectors are used to recognize the beginning of musical events in audio signals. Manual parameter tuning for onset detectors is a time consuming task, while existing automated approaches often maximize only a single performance metric. These automated approaches cannot be used to optimize detector algorithms for complex scenarios, such as real-time onset detection where an optimization process must consider both detection accuracy and latency. For this reason, a flexible optimization algorithm should account for more than one performance metric in a multiobjective manner. This paper presents a generalized procedure for automated optimization of parametric onset detectors. Our procedure employs a bio-inspired evolutionary computation algorithm to replace manual parameter tuning, followed by the computation of the Pareto frontier for multi-objective optimization. The proposed approach was evaluated on all the onset detection methods of the Aubio library, using a dataset of monophonic acoustic guitar recordings. Results show that the proposed solution is effective in reducing the human effort required in the optimization process: it replaced more than two days of manual parameter tuning with 13 hours and 34 minutes of automated computation. Moreover, the resulting performance was comparable to that obtained by manual optimization.