Download Perceptual Decorrelator Based on Resonators Decorrelation filters transform mono audio into multiple decorrelated copies. This paper introduces a novel decorrelation filter design based on a resonator bank, which produces a sum of over a thousand exponentially decaying sinusoids. A headphone listening test was used to identify the minimum inter-channel time delays that perceptually match ERB-filtered coherent noise to corresponding incoherent noise. The decay rate of each resonator is set based on a group delay profile determined by the listening test results at its corresponding frequency. Furthermore, the delays from the test are used to refine frequency-dependent windowing in coherence estimation, which we argue represents the perceptually most accurate way of assessing interaural coherence. This coherence measure then guides an optimization process that adjusts the initial phases of the sinusoids to minimize the coherence between two instances of the resonator-based decorrelator. The delay results establish the necessary group delay per ERB for effective decorrelation, revealing higher-than-expected values, particularly at higher frequencies. For comparison, the optimization is also performed using two previously proposed group-delay profiles: one based on the period of the ERB band center frequency and another based on the maximum group-delay limit before introducing smearing. The results indicate that the perceptually informed profile achieves equal decorrelation to the latter profile while smearing less at high frequencies. Overall, optimizing the phase response of the proposed decorrelator yields significantly lower coherence compared to using a random phase.
Download Stable Limit Cycles as Tunable Signal Sources This paper presents a method for synthesizing audio signals from
nonlinear dynamical systems exhibiting stable limit cycles, with
control over frequency and amplitude independent of changes to
the system’s internal parameters. Using the van der Pol oscillator
and the Brusselator as case studies, it is demonstrated how parameters are decoupled from frequency and amplitude by rescaling the
angular frequency and normalizing amplitude extrema. Practical
implementation considerations are discussed, as are the limits and
challenges of this approach. The method’s validity is evaluated experimentally and synthesis examples show the application of tunable nonlinear oscillators in sound design, including the generation
of transients in FM synthesis by means of a van der Pol oscillator
and a Supersaw oscillator bank based on the Brusselator.
Download Towards an Objective Comparison of Panning Feature Algorithms for Unsupervised Learning Estimations of panning attributes are an important feature to extract from a piece of recorded music, with downstream uses such
as classification, quality assessment, and listening enhancement.
While several algorithms exist in the literature, there is currently
no comparison between them and no studies to suggest which one
is most suitable for any particular task. This paper compares four
algorithms for extracting amplitude panning features with respect
to their suitability for unsupervised learning. It finds synchronicities between them and analyses their results on a small set of
commercial music excerpts chosen for their distinct panning features. The ability of each algorithm to differentiate between the
tracks is analysed. The results can be used in future work to either
select the most appropriate panning feature algorithm or create a
version customized for a particular task.
Download Impedance Synthesis for Hybrid Analog-Digital Audio Effects Most real systems, from acoustics to analog electronics, are
characterised by bidirectional coupling amongst elements rather
than neat, unidirectional signal flows between self-contained modules. Integrating digital processing into physical domains becomes
a significant engineering challenge when the application requires
bidirectional coupling across the physical-digital boundary rather
than separate, well-defined inputs and outputs. We introduce an
approach to hybrid analog-digital audio processing using synthetic
impedance: digitally simulated circuit elements integrated into an
otherwise analog circuit. This approach combines the physicality and classic character of analog audio circuits alongside the
precision and flexibility of digital signal processing (DSP). Our
impedance synthesis system consists of a voltage-controlled current source and a microcontroller-based DSP system. We demonstrate our technique through modifying an iconic guitar distortion pedal, the Boss DS-1, showing the ability of the synthetic
impedance to both replicate and extend the behaviour of the pedal’s
diode clipping stage. We discuss the behaviour of the synthetic
impedance in isolated laboratory conditions and in the DS-1 pedal,
highlighting the technical and creative potential of the technique as
well as its practical limitations and future extensions.
Download Learning Nonlinear Dynamics in Physical Modelling Synthesis Using Neural Ordinary Differential Equations Modal synthesis methods are a long-standing approach for modelling distributed musical systems. In some cases extensions are
possible in order to handle geometric nonlinearities. One such
case is the high-amplitude vibration of a string, where geometric nonlinear effects lead to perceptually important effects including pitch glides and a dependence of brightness on striking amplitude. A modal decomposition leads to a coupled nonlinear system of ordinary differential equations. Recent work in applied machine learning approaches (in particular neural ordinary differential equations) has been used to model lumped dynamic systems
such as electronic circuits automatically from data. In this work,
we examine how modal decomposition can be combined with neural ordinary differential equations for modelling distributed musical systems. The proposed model leverages the analytical solution
for linear vibration of system’s modes and employs a neural network to account for nonlinear dynamic behaviour. Physical parameters of a system remain easily accessible after the training without
the need for a parameter encoder in the network architecture. As
an initial proof of concept, we generate synthetic data for a nonlinear transverse string and show that the model can be trained to
reproduce the nonlinear dynamics of the system. Sound examples
are presented.
Download DataRES and PyRES: A Room Dataset and a Python Library for Reverberation Enhancement System Development, Evaluation, and Simulation Reverberation is crucial in the acoustical design of physical
spaces, especially halls for live music performances. Reverberation Enhancement Systems (RESs) are active acoustic systems that
can control the reverberation properties of physical spaces, allowing them to adapt to specific acoustical needs. The performance of
RESs strongly depends on the properties of the physical room and
the architecture of the Digital Signal Processor (DSP). However,
room-impulse-response (RIR) measurements and the DSP code
from previous studies on RESs have never been made open access, leading to non-reproducible results. In this study, we present
DataRES and PyRES—a RIR dataset and a Python library to increase the reproducibility of studies on RESs. The dataset contains RIRs measured in RES research and development rooms and
professional music venues. The library offers classes and functionality for the development, evaluation, and simulation of RESs.
The implemented DSP architectures are made differentiable, allowing their components to be trained in a machine-learning-like
pipeline. The replication of previous studies by the authors shows
that PyRES can become a useful tool in future research on RESs.
Download Biquad Coefficients Optimization via Kolmogorov-Arnold Networks Conventional Deep Learning (DL) approaches to Infinite Impulse
Response (IIR) filter coefficients estimation from arbitrary frequency response are quite limited. They often suffer from inefficiencies such as tight training requirements, high complexity, and
limited accuracy. As an alternative, in this paper, we explore the
use of Kolmogorov-Arnold Networks (KANs) to predict the IIR
filter—specifically biquad coefficients—effectively. By leveraging the high interpretability and accuracy of KANs, we achieve
smooth coefficients’ optimization. Furthermore, by constraining
the search space and exploring different loss functions, we demonstrate improved performance in speed and accuracy. Our approach
is evaluated against other existing differentiable IIR filter solutions. The results show significant advantages of KANs over existing methods, offering steadier convergences and more accurate
results. This offers new possibilities for integrating digital infinite
impulse response (IIR) filters into deep-learning frameworks.
Download Zero-Phase Sound via Giant FFT Given the speedy computation of the FFT in current computer
hardware, there are new possibilities for examining transformations for very long sounds. A zero-phase version of any audio
signal can be obtained by zeroing the phase angle of its complex
spectrum and taking the inverse FFT. This paper recommends additional processing steps, including zero-padding, transient suppression at the signal’s start and end, and gain compensation, to
enhance the resulting sound quality. As a result, a sound with the
same spectral characteristics as the original one, but with different temporal events, is obtained. Repeating rhythm patterns are
retained, however. Zero-phase sounds are palindromic in the sense
that they are symmetric in time. A comparison of the zero-phase
conversion to the autocorrelation function helps to understand its
properties, such as why the rhythm of the original sound is emphasized. It is also argued that the zero-phase signal has the same
autocorrelation function as the original sound. One exciting variation of the method is to apply the method separately to the real
and imaginary parts of the spectrum to produce a stereo effect. A
frame-based technique enables the use of the zero-phase conversion in real-time audio processing. The zero-phase conversion is
another member of the giant FFT toolset, allowing the modification of sampled sounds, such as drum loops or entire songs.
Download Generative Latent Spaces for Neural Synthesis of Audio Textures This paper investigates the synthesis of audio textures and the
structure of generative latent spaces using Variational Autoencoders (VAEs) within two paradigms of neural audio synthesis:
DSP-inspired and data-driven approaches. For each paradigm, we
propose VAE-based frameworks that allow fine-grained temporal
control. We introduce datasets across three categories of environmental sounds to support our investigations. We evaluate and compare the models’ reconstruction performance using objective metrics, and investigate their generative capabilities and latent space
structure through latent space interpolations.
Download Power-Balanced Drift Regulation for Scalar Auxiliary Variable Methods: Application to Real-Time Simulation of Nonlinear String Vibrations Efficient stable integration methods for nonlinear systems are
of great importance for physical modeling sound synthesis. Specifically, a number of musical systems of interest, including vibrating
strings, bars or plates may be written as port-Hamiltonian systems
with quadratic kinetic energy and non-quadratic potential energy.
Efficient schemes have been developed for such systems through
the introduction of a scalar auxiliary variable. As a result, the stable real-time simulations of nonlinear musical systems of up to a
few thousands of degrees of freedom is possible, even for nearly
lossless systems. However, convergence rates can be slow and
seem to be system-dependent. Specifically, at audio rates, they
may suffer from numerical drift of the auxiliary variable, resulting
in dramatic unwanted effects on audio output, such as pitch drifts
after several impacts on the same resonator.
In this paper, a novel method for mitigating this unwanted drift
while preserving power balance is presented, based on a control
approach. A set of modified equations is proposed to control the
drift artefact by rerouting energy through the scalar auxiliary variable and potential energy state. Numerical experiments are run
in order to check convergence on simulations in the case of a cubic nonlinear string. A real-time implementation is provided as
a Max/MSP external. 60-note polyphony is achieved on a laptop, and some simple high level control parameters are provided,
making the proposed implementation suitable for use in artistic
contexts. All code is available in a public repository, along with
compiled Max/MSP externals1.