Download Two Datasets of Room Impulse Responses for Navigation in Six Degrees-of-Freedom:a Symphonic Concert Hall and a Former Planetarium
This paper presents two datasets of room impulse responses (RIRs) for navigable virtual acoustics. The first is a set of 240 mono and Ambisonic RIRs recorded at the Maison Symphonique, a symphonic concert hall in Montreal renowned for its great acoustic characteristics. The second is a set of 67 third-order Ambisonic RIRs which was recorded in the former planetarium of Montreal (currently known as the Centech), a space where the room acoustic includes an acoustic focal point where extreme reverberation times occur. The article first describes the two datasets and the methods that were used to capture them. A use case for these RIRs is then presented: an audio rendering of scene navigation using interpolation among RIRs.
Download Interacting With Digital Audio Effects Through a Haptic Knob With Programmable Resistance
Live music performances and music production often involve the manipulation of several parameters during sound generation, processing, and mixing. In hardware layouts, those parameters are usually controlled using knobs, sliders and buttons. When these layouts are virtualized, the use of physical (e.g. MIDI) controllers can make interaction easier and reduce the cognitive load associated to sound manipulation. The addition of haptic feedback can further improve such interaction by facilitating the detection of the nature (continuous / discrete) and value of a parameter. To this end, we have realized an endless-knob controller prototype with programmable resistance to rotation, able to render various haptic effects. Ten subjects assessed the effectiveness of the provided haptic feedback in a target-matching task where either visual-only or visual-haptic feedback was provided; the experiment reported significantly lower errors in presence of haptic feedback. Finally, the knob was configured as a multi-parametric controller for a real-time audio effect software written in Python, simulating the voltage-controlled filter aboard the EMS VCS3. The integration of the sound algorithm and the haptic knob is discussed, together with various haptic feedback effects in response to control actions.
Download Sitrano: A Matlab App for Sines-Transients-Noise Decomposition of Audio Signals
Decomposition of sounds into their sinusoidal, transient, and noise components is an active research topic and a widely-used tool in audio processing. Multiple solutions have been proposed in recent years, using time–frequency representations to identify either horizontal and vertical structures or orientations and anisotropy in the spectrogram of the sound. In this paper, we present SiTraNo: an easy-to-use MATLAB application with a graphic user interface for audio decomposition that enables visualization and access to the sinusoidal, transient, and noise classes, individually. This application allows the user to choose between different well-known separation methods to analyze an input sound file, to instantaneously control and remix its spectral components, and to visually check the quality of the separation, before producing the desired output file. The visualization of common artifacts, such as birdies and dropouts, is demonstrated. This application promotes experimenting with the sound decomposition process by observing the effect of variations for each spectral component on the original sound and by comparing different methods against each other, evaluating the separation quality both audibly and visually. SiTraNo and its source code are available on a companion website and repository.
Download RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
Voice conversion has gained increasing popularity within the field of audio manipulation and speech synthesis. Often, the main objective is to transfer the input identity to that of a target speaker without changing its linguistic content. While current work provides high-fidelity solutions they rarely focus on model simplicity, high-sampling rate environments or stream-ability. By incorporating speech representation learning into a generative timbre transfer model, traditionally created for musical purposes, we investigate the realm of voice conversion generated directly in the time domain at high sampling rates. More specifically, we guide the latent space of a baseline model towards linguistically relevant representations and condition it on external speaker information. Through objective and subjective assessments, we demonstrate that the proposed solution can attain levels of naturalness, quality, and intelligibility comparable to those of a state-of-the-art solution for seen speakers, while significantly decreasing inference time. However, despite the presence of target speaker characteristics in the converted output, the actual similarity to unseen speakers remains a challenge.
Download Graph-Based Audio Looping and Granulation
In this paper we describe similarity graphs computed from timefrequency analysis as a guide for audio playback, with the aim of extending the content of fixed recordings in creative applications. We explain the creation of the graph from the distance between spectral frames, as well as several features computed from the graph, such as methods for onset detection, beat detection, and cluster analysis. Several playback algorithms can be devised based on conditional pruning of the graph using these methods. We describe examples for looping, granulation, and automatic montage.
Download Identification of Nonlinear Circuits as Port-Hamiltonian Systems
This paper addresses identification of nonlinear circuits for power-balanced virtual analog modeling and simulation. The proposed method combines a port-Hamiltonian system formulation with kernel-based methods to retrieve model laws from measurements. This combination allows for the estimated model to retain physical properties that are crucial for the accuracy of simulations, while representing a variety of nonlinear behaviors. As an illustration, the method is used to identify a nonlinear passive peaking EQ.
Download The Role of Modal Excitation in Colorless Reverberation
A perceptual study revealing a novel connection between modal properties of feedback delay networks (FDNs) and colorless reverberation is presented. The coloration of the reverberation tail is quantified by the modal excitation distribution derived from the modal decomposition of the FDN. A homogeneously decaying allpass FDN is designed to be colorless such that the corresponding narrow modal excitation distribution leads to a high perceived modal density. Synthetic modal excitation distributions are generated to match modal excitations of FDNs. Three listening tests were conducted to demonstrate the correlation between the modal excitation distribution and the perceived degree of coloration. A fourth test shows a significant reduction of coloration by the colorless FDN compared to other FDN designs. The novel connection of modal excitation, allpass FDNs, and perceived coloration presents a beneficial design criterion for colorless artificial reverberation.
Download Fast Temporal Convolutions for Real-Time Audio Signal Processing
This paper introduces the possibilities of optimizing neural network convolutional layers for modeling nonlinear audio systems and effects. Enhanced methods for real-time dilated convolutions are presented to achieve faster signal processing times than in previous work. Due to the improved implementation of convolutional layers, a significant decrease in computational requirements was observed and validated on different configurations of single layers with dilated convolutions and WaveNet-style feedforward neural network models. In most cases, equivalent signal processing times were achieved to those using recurrent neural networks with Long Short-Term Memory units and Gated Recurrent Units, which are considered state-of-the-art in the field of black-box virtual analog modeling.
Download Differentiable White-Box Virtual Analog Modeling
Component-wise circuit modeling, also known as “white-box” modeling, is a well established and much discussed technique in virtual analog modeling. This approach is generally limited in accuracy by lack of access to the exact component values present in a real example of the circuit. In this paper we show how this problem can be addressed by implementing the white-box model in a differentiable form, and allowing approximate component values to be learned from raw input–output audio measured from a real device.
Download One Billion Audio Sounds From Gpu-Enabled Modular Synthesis
We release synth1B1, a multi-modal audio corpus consisting of 1 billion 4-second synthesized sounds, paired with the synthesis parameters used to generate them. The dataset is 100x larger than any audio dataset in the literature. We also introduce torchsynth, an open source modular synthesizer that generates the synth1B1 samples on-the-fly at 16200x faster than real-time (714MHz) on a single GPU. Finally, we release two new audio datasets: FM synth timbre and subtractive synth pitch. Using these datasets, we demonstrate new rank-based evaluation criteria for existing audio representations. Finally, we propose a novel approach to synthesizer hyperparameter optimization.