Download Probabilistic Reverberation Model Based on Echo Density and Kurtosis This article proposes a probabilistic model for synthesizing room impulse responses (RIRs) for use in convolution artificial reverberators. The proposed method is based on the concept of echo density. Echo density is a measure of the number of echoes per second in an impulse response and is a demonstrated perceptual metric of artificial reverberation quality. As echo density is related to the statistical measure of kurtosis, this article demonstrates that the statistics of an RIR can be modeled using a probabilistic mixture model. A mixture model designed specifically for modeling RIRs is proposed. The proposed method is useful for statistically replicating RIRs of a measured environment, thereby synthesizing new independent observations of an acoustic space. A perceptual pilot study is carried out to evaluate the fidelity of the replication process in monophonic and stereo artificial reverberators.
Download Real-Time Modal Synthesis of Nonlinearly Interconnected Networks Modal methods are a long-established approach to physical modeling sound synthesis. Projecting the equation of motion of a linear, time-invariant system onto a basis of eigenfunctions yields a set of independent forced, lossy oscillators, which may be simulated efficiently and accurately by means of standard time-stepping methods. Extensions of modal techniques to nonlinear problems are possible, though often requiring the solution of densely coupled nonlinear time-dependent equations. Here, an application of recent results in numerical simulation design is employed, in which the nonlinear energy is first quadratised via a convenient auxiliary variable. The resulting equations may be updated in time explicitly, thus avoiding the need for expensive iterative solvers, dense linear system solutions, or matrix inversions. The case of a network of interconnected distributed elements is detailed, along with a real-time implementation as an audio plugin.
Download CONMOD: Controllable Neural Frame-Based Modulation Effects Deep learning models have seen widespread use in modelling LFOdriven audio effects, such as phaser and flanger. Although existing neural architectures exhibit high-quality emulation of individual effects, they do not possess the capability to manipulate the output via control parameters. To address this issue, we introduce Controllable Neural Frame-based Modulation Effects (CONMOD), a single black-box model which emulates various LFOdriven effects in a frame-wise manner, offering control over LFO frequency and feedback parameters. Additionally, the model is capable of learning the continuous embedding space of two distinct phaser effects, enabling us to steer between effects and achieve creative outputs. Our model outperforms previous work while possessing both controllability and universality, presenting opportunities to enhance creativity in modern LFO-driven audio effects. Additional demo of our model is available in the accompanying website.1
Download Band-Limited Impulse Invariance Method Using Lagrange Kernels The band-limited impulse invariance method is a recently proposed approach for the discrete-time modeling of an LTI continuoustime system. Both the magnitude and phase responses are accurately modeled by means of discrete-time filters. It is an extension of the conventional impulse invariance method, which is based on the time-domain sampling of the continuous-time response. The resulting IIR filter typically exhibits spectral aliasing artifacts. In the band-limited impulse invariance method, an FIR filter is combined in parallel with the IIR filter, in such a way that the frequency response of the FIR part reduces the aliasing contributions. This method was shown to improve the frequency-domain accuracy while maintaining the compact temporal structure of the discrete-time model. In this paper, a new version of the bandlimited impulse invariance method is introduced, where the FIR coefficients are derived in closed form by examining the discontinuities that occur in the continuous-time domain. An analytical anti-aliasing filtering is performed by replacing the discontinuities with band-limited transients. The band-limited discontinuities are designed by using the anti-derivatives of the Lagrange interpolation kernel. The proposed method is demonstrated by a wave scattering example, where the acoustical impulse responses on a rigid spherical scatter are simulated.
Download Antiderivative Antialiasing with Frequency Compensation for Stateful Systems Employing nonlinear functions in audio DSP algorithms requires attention as they generally introduce aliasing. Among others, antiderivative antialiasing proved to be an effective method for static nonlinearities and gave rise to a number of variants, including our AA-IIR method. In this paper we introduce an improvement to AA-IIR that makes it suitable for use in stateful systems. Indeed, employing standard antiderivative antialiasing techniques in such systems alters their frequency response and may cause stability issues. Our method consists in cascading a digital filter after the AA-IIR block in order to fully compensate for unwanted delay and frequency-dependent effects. We study the conditions for such a digital filter to be stable itself and evaluate the method by applying it to the diode clipper circuit.
Download Deforming the Oscillator: Iterative Phases Over Parametrizable Closed Paths Iterative phase formulations allow for the generalization of many oscillatory sound synthesis methods from circles to general parametrizable loops, with or without explicit geometric contexts. This paper describes this approach, leading to the ability to perform modulation, feedback and chaotic oscillations over deformed circles that can include ill-behaved geometries, while allowing modulations or feedback to be deformed as well.
Download Higher-Order Scattering Delay Networksfor Artificial Reverberation Computer simulations of room acoustics suffer from an efficiency vs accuracy trade-off, with highly accurate wave-based models being highly computationally expensive, and delay-network-based models lacking in physical accuracy. The Scattering Delay Network (SDN) is a highly efficient recursive structure that renders first order reflections exactly while approximating higher order ones. With the purpose of improving the accuracy of SDNs, in this paper, several variations on SDNs are investigated, including appropriate node placement for exact modeling of higher order reflections, redesigned scattering matrices for physically-motivated scattering, and pruned network connections for reduced computational complexity. The results of these variations are compared to state-of-the-art geometric acoustic models for different shoebox room simulations. Objective measures (Normalized Echo Densities (NEDs) and Energy Decay Curves (EDCs)) showed a close match between the proposed methods and the references. A formal listening test was carried out to evaluate differences in perceived naturalness of the synthesized Room Impulse Responses. Results show that increasing SDNs’ order and adding directional scattering in a fully-connected network improves perceived naturalness, and higher-order pruned networks give similar performance at a much lower computational cost.
Download Multichannel Interleaved Velvet Noise The cross-correlation of multichannel reverberation generated using interleaved velvet noise is studied. The interleaved velvetnoise reverberator was proposed recently for synthesizing the late reverb of an acoustic space. In addition to providing a computationally efficient structure and a perceptually smooth response, the interleaving method allows combining its independent branch outputs in different permutations, which are all equally smooth and flutter-free. For instance, a four-branch output can be combined in 4! or 24 ways. Additionally, each branch output set is mixed orthogonally, which increases the number of permutations from M ! to M 2 !, since sign inversions are taken along. Using specific matrices for this operation, which change the sign of velvet-noise sequences, decreases the correlation of some of the combinations. This paper shows that many selections of permutations offer a set of well decorrelated output channels, which produce a diffuse and colorless sound field, which is validated with spatial variation. The results of this work can be applied in the design of computationally efficient multichannel reverberators.
Download Improved Automatic Instrumentation Role Classification and Loop Activation Transcription Many electronic music (EM) genres are composed through the activation of short audio recordings of instruments designed for seamless repetition—or loops. In this work, loops of key structural groups such as bass, percussive or melodic elements are labelled by the role they occupy in a piece of music through the task of automatic instrumentation role classification (AIRC). Such labels assist EM producers in the identification of compatible loops in large unstructured audio databases. While human annotation is often laborious, automatic classification allows for fast and scalable generation of these labels. We experiment with several deeplearning architectures and propose a data augmentation method for improving multi-label representation to balance classes within the Freesound Loop Dataset. To improve the classification accuracy of the architectures, we also evaluate different pooling operations. Results indicate that in combination with the data augmentation and pooling strategies, the proposed system achieves state-of-theart performance for AIRC. Additionally, we demonstrate how our proposed AIRC method is useful for analysing the structure of EM compositions through loop activation transcription.
Download Joint Estimation of Fader and Equalizer Gains of DJ Mixers Using Convex Optimization Disc jockeys (DJs) use audio effects to make a smooth transition from one song to another. There have been attempts to computationally analyze the creative process of seamless mixing. However, only a few studies estimated fader or equalizer (EQ) gains controlled by DJs. In this study, we propose a method that jointly estimates time-varying fader and EQ gains so as to reproduce the mix from individual source tracks. The method approximates the equalizer filters with a linear combination of a fixed equalizer filter and a constant gain to convert the joint estimation into a convex optimization problem. For the experiment, we collected a new DJ mix dataset that consists of 5,040 real-world DJ mixes with 50,742 transitions, and evaluated the proposed method with a mix reconstruction error. The result shows that the proposed method estimates the time-varying fader and equalizer gains more accurately than existing methods and simple baselines.