Download Two Datasets of Room Impulse Responses for Navigation in Six Degrees-of-Freedom:a Symphonic Concert Hall and a Former Planetarium This paper presents two datasets of room impulse responses (RIRs) for navigable virtual acoustics. The first is a set of 240 mono and Ambisonic RIRs recorded at the Maison Symphonique, a symphonic concert hall in Montreal renowned for its great acoustic characteristics. The second is a set of 67 third-order Ambisonic RIRs which was recorded in the former planetarium of Montreal (currently known as the Centech), a space where the room acoustic includes an acoustic focal point where extreme reverberation times occur. The article first describes the two datasets and the methods that were used to capture them. A use case for these RIRs is then presented: an audio rendering of scene navigation using interpolation among RIRs.
Download Interacting With Digital Audio Effects Through a Haptic Knob With Programmable Resistance Live music performances and music production often involve the
manipulation of several parameters during sound generation, processing, and mixing. In hardware layouts, those parameters are
usually controlled using knobs, sliders and buttons. When these
layouts are virtualized, the use of physical (e.g. MIDI) controllers
can make interaction easier and reduce the cognitive load associated to sound manipulation. The addition of haptic feedback can
further improve such interaction by facilitating the detection of the
nature (continuous / discrete) and value of a parameter. To this
end, we have realized an endless-knob controller prototype with
programmable resistance to rotation, able to render various haptic effects. Ten subjects assessed the effectiveness of the provided
haptic feedback in a target-matching task where either visual-only
or visual-haptic feedback was provided; the experiment reported
significantly lower errors in presence of haptic feedback. Finally,
the knob was configured as a multi-parametric controller for a
real-time audio effect software written in Python, simulating the
voltage-controlled filter aboard the EMS VCS3. The integration
of the sound algorithm and the haptic knob is discussed, together
with various haptic feedback effects in response to control actions.
Download Sitrano: A Matlab App for Sines-Transients-Noise Decomposition of Audio Signals Decomposition of sounds into their sinusoidal, transient, and noise
components is an active research topic and a widely-used tool in
audio processing. Multiple solutions have been proposed in recent
years, using time–frequency representations to identify either horizontal and vertical structures or orientations and anisotropy in the
spectrogram of the sound. In this paper, we present SiTraNo: an
easy-to-use MATLAB application with a graphic user interface for
audio decomposition that enables visualization and access to the
sinusoidal, transient, and noise classes, individually. This application allows the user to choose between different well-known separation methods to analyze an input sound file, to instantaneously
control and remix its spectral components, and to visually check
the quality of the separation, before producing the desired output
file. The visualization of common artifacts, such as birdies and
dropouts, is demonstrated. This application promotes experimenting with the sound decomposition process by observing the effect
of variations for each spectral component on the original sound
and by comparing different methods against each other, evaluating
the separation quality both audibly and visually. SiTraNo and its
source code are available on a companion website and repository.
Download RAVE for Speech: Efficient Voice Conversion at High Sampling Rates Voice conversion has gained increasing popularity within the field of audio manipulation and speech synthesis. Often, the main objective is to transfer the input identity to that of a target speaker without changing its linguistic content. While current work provides high-fidelity solutions they rarely focus on model simplicity, high-sampling rate environments or stream-ability. By incorporating speech representation learning into a generative timbre transfer model, traditionally created for musical purposes, we investigate the realm of voice conversion generated directly in the time domain at high sampling rates. More specifically, we guide the latent space of a baseline model towards linguistically relevant representations and condition it on external speaker information. Through objective and subjective assessments, we demonstrate that the proposed solution can attain levels of naturalness, quality, and intelligibility comparable to those of a state-of-the-art solution for seen speakers, while significantly decreasing inference time. However, despite the presence of target speaker characteristics in the converted output, the actual similarity to unseen speakers remains a challenge.
Download Graph-Based Audio Looping and Granulation In this paper we describe similarity graphs computed from timefrequency analysis as a guide for audio playback, with the aim
of extending the content of fixed recordings in creative applications. We explain the creation of the graph from the distance between spectral frames, as well as several features computed from
the graph, such as methods for onset detection, beat detection, and
cluster analysis. Several playback algorithms can be devised based
on conditional pruning of the graph using these methods. We describe examples for looping, granulation, and automatic montage.
Download Identification of Nonlinear Circuits as Port-Hamiltonian Systems This paper addresses identification of nonlinear circuits for
power-balanced virtual analog modeling and simulation. The proposed method combines a port-Hamiltonian system formulation
with kernel-based methods to retrieve model laws from measurements. This combination allows for the estimated model to retain
physical properties that are crucial for the accuracy of simulations,
while representing a variety of nonlinear behaviors. As an illustration, the method is used to identify a nonlinear passive peaking
EQ.
Download The Role of Modal Excitation in Colorless Reverberation A perceptual study revealing a novel connection between modal
properties of feedback delay networks (FDNs) and colorless reverberation is presented. The coloration of the reverberation tail
is quantified by the modal excitation distribution derived from the
modal decomposition of the FDN. A homogeneously decaying allpass FDN is designed to be colorless such that the corresponding narrow modal excitation distribution leads to a high perceived
modal density. Synthetic modal excitation distributions are generated to match modal excitations of FDNs. Three listening tests
were conducted to demonstrate the correlation between the modal
excitation distribution and the perceived degree of coloration. A
fourth test shows a significant reduction of coloration by the colorless FDN compared to other FDN designs. The novel connection of modal excitation, allpass FDNs, and perceived coloration
presents a beneficial design criterion for colorless artificial reverberation.
Download Fast Temporal Convolutions for Real-Time Audio Signal Processing This paper introduces the possibilities of optimizing neural network convolutional layers for modeling nonlinear audio systems and effects. Enhanced methods for real-time dilated convolutions are presented to achieve faster signal processing times than in previous work. Due to the improved implementation of convolutional layers, a significant decrease in computational requirements was observed and validated on different configurations of single layers with dilated convolutions and WaveNet-style feedforward neural network models. In most cases, equivalent signal processing times were achieved to those using recurrent neural networks with Long Short-Term Memory units and Gated Recurrent Units, which are considered state-of-the-art in the field of black-box virtual analog modeling.
Download Differentiable White-Box Virtual Analog Modeling Component-wise circuit modeling, also known as “white-box”
modeling, is a well established and much discussed technique in
virtual analog modeling. This approach is generally limited in accuracy by lack of access to the exact component values present in
a real example of the circuit. In this paper we show how this problem can be addressed by implementing the white-box model in a
differentiable form, and allowing approximate component values
to be learned from raw input–output audio measured from a real
device.
Download One Billion Audio Sounds From Gpu-Enabled Modular Synthesis We release synth1B1, a multi-modal audio corpus consisting of 1
billion 4-second synthesized sounds, paired with the synthesis parameters used to generate them. The dataset is 100x larger than
any audio dataset in the literature. We also introduce torchsynth,
an open source modular synthesizer that generates the synth1B1
samples on-the-fly at 16200x faster than real-time (714MHz) on
a single GPU. Finally, we release two new audio datasets: FM
synth timbre and subtractive synth pitch. Using these datasets, we
demonstrate new rank-based evaluation criteria for existing audio
representations. Finally, we propose a novel approach to synthesizer hyperparameter optimization.