Download Fade-in Control for Feedback Delay Networks
In virtual acoustics, it is common to simulate the early part of a Room Impulse Response using approaches from geometrical acoustics and the late part using Feedback Delay Networks (FDNs). In order to transition from the early to the late part, it is useful to slowly fade-in the FDN response. We propose two methods to control the fade-in, one based on double decays and the other based on modal beating. We use modal analysis to explain the two concepts for incorporating this fade-in behaviour entirely within the IIR structure of a multiple input multiple output FDN. We present design equations, which allow for placing the fade-in time at an arbitrary point within its derived limit.
Download Delay Network Architectures for Room and Coupled Space Modeling
Feedback delay network reverberators have decay filters associated with each delay line to model the frequency dependent reverberation time (T60) of a space. The decay filters are typically designed such that all delay lines independently produce the same T60 frequency response. However, in real rooms, there are multiple, concurrent T60 responses that depend on the geometry and physical properties of the materials present in the rooms. In this paper, we propose the Grouped Feedback Delay Network (GFDN), where groups of delay lines share different target T60s. We use the GFDN to simulate coupled rooms, where one room is significantly larger than the other. We also simulate rooms with different materials, with unique decay filters associated with each delay line group, designed to represent the T60 characteristics of a particular material. The T60 filters are designed to emulate the materials’ absorption characteristics with minimal computation. We discuss the design of the mixing matrix to control inter- and intra-group mixing, and show how the amount of mixing affects behavior of the room modes. Finally, we discuss the inclusion of air absorption filters on each delay line and physically motivated room resizing techniques with the GFDN.
Download Energy-Preserving Time-Varying Schroeder Allpass Filters
In artificial reverb algorithms, gains are commonly varied over time to break up temporal patterns, improving quality. We propose a family of novel Schroeder-style allpass filters that are energypreserving under arbitrary, continuous changes of their gains over time. All of them are canonic in delays, and some are also canonic in multiplies. This yields several structures that are novel even in the time-invariant case. Special cases for cascading and nesting these structures with a reduced number of multipliers are shown as well. The proposed structures should be useful in artificial reverb applications and other time-varying audio effects based on allpass filters, especially where allpass filters are embedded in feedback loops and stability may be an issue.
Download Perceptual Evaluation of Mitigation Approaches of Impairments Due to Spatial Undersampling in Binaural Rendering of Spherical Microphone Array Data: Dry Acoustic Environments
Employing a finite number of discrete microphones, instead of a continuous distribution according to theory, reduces the physical accuracy of sound field representations captured by a spherical microphone array. For a binaural reproduction of the sound field, a number of approaches have been proposed in the literature to mitigate the perceptual impairment when the captured sound fields are reproduced binaurally. We recently presented a perceptual evaluation of a representative set of approaches in conjunction with reverberant acoustic environments. This paper presents a similar study but with acoustically dry environments with reverberation times of less than 0.25 s. We examined the Magnitude Least-Squares algorithm, the Bandwidth Extraction Algorithm for Microphone Arrays, Spherical Head Filters, spherical harmonics Tapering, and Spatial Subsampling, all up to a spherical harmonics order of 7. Although dry environments violate some of the assumptions underlying some of the approaches, we can confirm the results of our previous study: Most approaches achieve an improvement whereby the magnitude of the improvement is comparable across approaches and acoustic environments.
Download Interaural Cues Cartography: Localization Cues Repartition for Three Spatialization Methods
The Synthetic Transaural Audio Rendering (STAR) method, first introduced at DAFx-06 then enhanced at DAFx-19, is a perceptive approach for sound spatialization aiming at reproducing the acoustic cues at the ears of the listener, using loudspeakers. To validate the method, several comparisons with state-of-the-art spatialization methods (VBAP and HOA) were conducted. Previously, quality comparisons with human subjects have been made, providing meaningful subjective results in real conditions. In this article an objective comparison is proposed, using acoustic cues error maps. The cartography enables us to study the spatialization effect in a 2D space, for a listening position within an audience, and thus not necessarily located at the center. Two approaches are conducted: the first simulates the binaural signals for a virtual KEMAR manikin, in ideal conditions and with a fine resolution; the second records these binaural signals using a real KEMAR manikin, providing real data with reverberation, though with a coarser resolution. In both cases the acoustic cues were derived from the binaural signals (either simulated or measured), and compared to the reference value taken at the center of the octophonic loudspeakers configuration. The obtained error maps display comforting results, our STAR method producing the smallest error for both simulated and experimental conditions.
Download Neural Parametric Equalizer Matching Using Differentiable Biquads
This paper proposes a neural network for carrying out parametric equalizer (EQ) matching. The novelty of this neural network solution is that it can be optimized directly in the frequency domain by means of differentiable biquads, rather than relying solely on a loss on parameter values which does not correlate directly with the system output. We compare the performance of the proposed neural network approach with that of a baseline algorithm based on a convex relaxation of the problem. It is observed that the neural network can provide better matching than the baseline approach because it directly attempts to solve the non-convex problem. Moreover, we show that the same network trained with only a parameter loss is insufficient for the task, despite the fact that it matches underlying EQ parameters better than one trained with a combination of spectral and parameter losses.
Download Relative Music Loudness Estimation Using Temporal Convolutional Networks and a CNN Feature Extraction Front-End
Relative music loudness estimation is a MIR task that consists in dividing audio in segments of three classes: Foreground Music, Background Music and No Music. Given the temporal correlation of music, in this work we approach the task using a type of network with the ability to model temporal context: the Temporal Convolutional Network (TCN). We propose two architectures: a TCN, and a novel architecture resulting from the combination of a TCN with a Convolutional Neural Network (CNN) front-end. We name this new architecture CNN-TCN. We expect the CNN front-end to work as a feature extraction strategy to achieve a more efficient usage of the network’s parameters. We use the OpenBMAT dataset to train and test 40 TCN and 80 CNN-TCN models with two grid searches over a set of hyper-parameters. We compare our models with the two best algorithms submitted to the tasks of music detection and relative music loudness estimation in MIREX 2019. All our models outperform the MIREX algorithms even when using a lower number of parameters. The CNN-TCN emerges as the best architecture as all its models outperform all TCN models. We show that adding a CNN front-end to a TCN can actually reduce the number of parameters of the network while improving performance. The CNN front-end effectively works as a feature extractor producing consistent patterns that identify different combinations of music and non-music sounds and also helps in producing a smoother output in comparison to the TCN models.
Download Neural Modelling of Time-Varying Effects
This paper proposes a grey-box neural network based approach to modelling LFO modulated time-varying effects. The neural network model receives both the unprocessed audio, as well as the LFO signal, as input. This allows complete control over the model’s LFO frequency and shape. The neural networks are trained using guitar audio, which has to be processed by the target effect and also annotated with the predicted LFO signal before training. A measurement signal based on regularly spaced chirps was used to accurately predict the LFO signal. The model architecture has been previously shown to be capable of running in real-time on a modern desktop computer, whilst using relatively little processing power. We validate our approach creating models of both a phaser and a flanger effects pedal, and theoretically it can be applied to any LFO modulated time-varying effect. In the best case, an errorto-signal ratio of 1.3% is achieved when modelling a flanger pedal, and previous work has shown that this corresponds to the model being nearly indistinguishable from the target device.
Download Onset-Informed Source Separation Using Non-Negative Matrix Factorization With Binary Masks
This paper describes a new onset-informed source separation method based on non-negative matrix factorization (NMF) with binary masks. Many previous approaches to separate a target instrument sound from polyphonic music have used side-information of the target that is time-consuming to prepare. The proposed method leverages the onsets of the target instrument sound to facilitate separation. Onsets are useful information that users can easily generate by tapping while listening to the target in music. To utilize onsets in NMF-based sound source separation, we introduce binary masks that represent on/off states of the target sound. Binary masks are formulated as Markov chains based on continuity of musical instrument sound. Owing to the binary masks, onsets can be handled as a time frame in which the binary masks change from off to on state. The proposed model is inferred by Gibbs sampling, in which the target sound source can be sampled efficiently by using its onsets. We conducted experiments to separate the target melody instrument from recorded polyphonic music. Separation results showed about 2 to 10 dB improvement in target source to residual noise ratio compared to the polyphonic sound. When some onsets were missed or deviated, the method is still effective for target sound source separation.
Download Differentiable IIR Filters for Machine Learning Applications
In this paper we present an approach to using traditional digital IIR filter structures inside deep-learning networks trained using backpropagation. We establish the link between such structures and recurrent neural networks. Three different differentiable IIR filter topologies are presented and compared against each other and an established baseline. Additionally, a simple Wiener-Hammerstein model using differentiable IIRs as its filtering component is presented and trained on a guitar signal played through a Boss DS-1 guitar pedal.