Download Differentiable Feedback Delay Network for Colorless Reverberation Artificial reverberation algorithms often suffer from spectral coloration, usually in the form of metallic ringing, which impairs the perceived quality of sound. This paper proposes a method to reduce the coloration in the feedback delay network (FDN), a popular artificial reverberation algorithm. An optimization framework is employed entailing a differentiable FDN to learn a set of parameters decreasing coloration. The optimization objective is to minimize the spectral loss to obtain a flat magnitude response, with an additional temporal loss term to control the sparseness of the impulse response. The objective evaluation of the method shows a favorable narrower distribution of modal excitation while retaining the impulse response density. The subjective evaluation demonstrates that the proposed method lowers perceptual coloration of late reverberation, and also shows that the suggested optimization improves sound quality for small FDN sizes. The method proposed in this work constitutes an improvement in the design of accurate and high-quality artificial reverberation, simultaneously offering computational savings.
Download DDSP-Based Neural Waveform Synthesis of Polyphonic Guitar Performance From String-Wise MIDI Input We explore the use of neural synthesis for acoustic guitar from string-wise MIDI input. We propose four different systems and compare them with both objective metrics and subjective evaluation against natural audio and a sample-based baseline. We iteratively develop these four systems by making various considerations on the architecture and intermediate tasks, such as predicting pitch and loudness control features. We find that formulating the control feature prediction task as a classification task rather than a regression task yields better results. Furthermore, we find that our simplest proposed system, which directly predicts synthesis parameters from MIDI input performs the best out of the four proposed systems. Audio examples and code are available.
Download Differentiable MIMO Feedback Delay Networks for Multichannel Room Impulse Response Modeling Recently, with the advent of new performing headsets and goggles, the demand for Virtual and Augmented Reality applications has experienced a steep increase. In order to coherently navigate the virtual rooms, the acoustics of the scene must be emulated in the most accurate and efficient way possible. Amongst others, Feedback Delay Networks (FDNs) have proved to be valuable tools for tackling such a task. In this article, we expand and adapt a method recently proposed for the data-driven optimization of single-inputsingle-output FDNs to the multiple-input-multiple-output (MIMO) case for addressing spatial/space-time processing applications. By testing our methodology on items taken from two different datasets, we show that the parameters of MIMO FDNs can be jointly optimized to match some perceptual characteristics of given multichannel room impulse responses, overcoming approaches available in the literature, and paving the way toward increasingly efficient and accurate real-time virtual room acoustics rendering.
Download A Real-Time Approach for Estimating Pulse Tracking Parameters for Beat-Synchronous Audio Effects Predominant Local Pulse (PLP) estimation, an established method for extracting beat positions and other periodic pulse information from audio signals, has recently been extended with an online variant tailored for real-time applications. In this paper, we introduce a novel approach to generating various real-time control signals from the original online PLP output. While the PLP activation function encodes both predominant pulse information and pulse stability, we propose several normalization procedures to discern local pulse oscillation from stability, utilizing the PLP activation envelope. Through this, we generate pulse-synchronous Low Frequency Oscillators (LFOs) and supplementary confidence-based control signals, enabling dynamic control over audio effect parameters in real-time. Additionally, our approach enables beat position prediction, providing a look-ahead capability, for example, to compensate for system latency. To showcase the effectiveness of our control signals, we introduce an audio plugin prototype designed for integration within a Digital Audio Workstation (DAW), facilitating real-time applications of beat-synchronous effects during live mixing and performances. Moreover, this plugin serves as an educational tool, providing insights into PLP principles and the tempo structure of analyzed music signals.
Download Differentiable Scattering Delay Networks for Artificial Reverberation Scattering delay networks (SDNs) provide a flexible and efficient
framework for artificial reverberation and room acoustic modeling. In this work, we introduce a differentiable SDN, enabling
gradient-based optimization of its parameters to better approximate the acoustics of real-world environments. By formulating
key parameters such as scattering matrices and absorption filters
as differentiable functions, we employ gradient descent to optimize an SDN based on a target room impulse response. Our approach minimizes discrepancies in perceptually relevant acoustic
features, such as energy decay and frequency-dependent reverberation times. Experimental results demonstrate that the learned SDN
configurations significantly improve the accuracy of synthetic reverberation, highlighting the potential of data-driven room acoustic modeling.
Download DataRES and PyRES: A Room Dataset and a Python Library for Reverberation Enhancement System Development, Evaluation, and Simulation Reverberation is crucial in the acoustical design of physical
spaces, especially halls for live music performances. Reverberation Enhancement Systems (RESs) are active acoustic systems that
can control the reverberation properties of physical spaces, allowing them to adapt to specific acoustical needs. The performance of
RESs strongly depends on the properties of the physical room and
the architecture of the Digital Signal Processor (DSP). However,
room-impulse-response (RIR) measurements and the DSP code
from previous studies on RESs have never been made open access, leading to non-reproducible results. In this study, we present
DataRES and PyRES—a RIR dataset and a Python library to increase the reproducibility of studies on RESs. The dataset contains RIRs measured in RES research and development rooms and
professional music venues. The library offers classes and functionality for the development, evaluation, and simulation of RESs.
The implemented DSP architectures are made differentiable, allowing their components to be trained in a machine-learning-like
pipeline. The replication of previous studies by the authors shows
that PyRES can become a useful tool in future research on RESs.
Download VST Plug-in Module Performing Wavelet Transform in Real-time The paper presents a variant of the segmentwise wavelet transform (blockwise DWT, online DWT or SegDWT) algorithm adapted to real-time audio processing. The implementation of the algorithm as a VST plugin is presented as well. The main problem of segmentwise wavelet coefficient processing is the handling of the segment borders. The common border extension methods result in “false” coefficients, which in turn result in border distortion (block-end effects) after particular types of coefficient processing. In contrast, the SegDWT algorithm employs a segment extension technique to prevent this inconvenience and produce exactly the same coefficients as the wavelet transform of the whole signal would do. In this paper we remove some of the shortcomings of the original SegDWT algorithm; for example the need for the “right” segment extension is canceled. The VST plugin module created is described from the viewpoints of both the user and the programmer; the latter can easily add their own method for processing the coefficients.
Download Independent Manipulation of High-Level Spectral Envelope Shape Features for Sound Morphing by Means of Evolutionary Computation The aim of sound morphing is to obtain a sound that falls perceptually between two (or more) sounds. Ideally, we want to morph perceptually relevant features of sounds and be able to independently manipulate them. In this work we present a method to obtain perceptually intermediate spectral envelopes guided by highlevel spectral shape descriptors and a technique that employs evolutionary computation to independently manipulate the timbral features captured by the descriptors. High-level descriptors are measures of the acoustic correlates of salient timbre dimensions derived from perceptual studies, such that the manipulation of the descriptors corresponds to potentially interesting timbral variations.
Download Model-Based Obstacle Sonification for the Navigation of Visually Impaired Persons This paper proposes a sonification model for encoding visual 3D information into sounds, inspired by the impact properties of the objects encountered during blind navigation. The proposed model is compared against two sonification models developed for orientation and mobility, chosen based on their common technical requirements. An extensive validation of the proposed model is reported; five legally blind and five normally sighted participants evaluated the proposed model as compared to the two competitive models on a simplified experimental navigation scenario. The evaluation addressed not only the accuracy of the responses in terms of psychophysical measurements but also the cognitive load and emotional stress of the participants by means of biophysiological signals and evaluation questionnaires. Results show that the proposed impact sound model adequately conveys the relevant information to the participants with low cognitive load, following a short training session.
Download Generative timbre spaces: regularizing variational auto-encoders with perceptual metrics Timbre spaces have been used in music perception to study the perceptual relationships between instruments based on dissimilarity ratings. However, these spaces do not generalize to novel examples and do not provide an invertible mapping, preventing audio synthesis. In parallel, generative models have aimed to provide methods for synthesizing novel timbres. However, these systems do not provide an understanding of their inner workings and are usually not related to any perceptually relevant information. Here, we show that Variational Auto-Encoders (VAE) can alleviate all of these limitations by constructing generative timbre spaces. To do so, we adapt VAEs to learn an audio latent space, while using perceptual ratings from timbre studies to regularize the organization of this space. The resulting space allows us to analyze novel instruments, while being able to synthesize audio from any point of this space. We introduce a specific regularization allowing to enforce any given similarity distances onto these spaces. We show that the resulting space provide almost similar distance relationships as timbre spaces. We evaluate several spectral transforms and show that the Non-Stationary Gabor Transform (NSGT) provides the highest correlation to timbre spaces and the best quality of synthesis. Furthermore, we show that these spaces can generalize to novel instruments and can generate any path between instruments to understand their timbre relationships. As these spaces are continuous, we study how audio descriptors behave along the latent dimensions. We show that even though descriptors have an overall non-linear topology, they follow a locally smooth evolution. Based on this, we introduce a method for descriptor-based synthesis and show that we can control the descriptors of an instrument while keeping its timbre structure.