Download Differentiable Scattering Delay Networks for Artificial Reverberation Scattering delay networks (SDNs) provide a flexible and efficient
framework for artificial reverberation and room acoustic modeling. In this work, we introduce a differentiable SDN, enabling
gradient-based optimization of its parameters to better approximate the acoustics of real-world environments. By formulating
key parameters such as scattering matrices and absorption filters
as differentiable functions, we employ gradient descent to optimize an SDN based on a target room impulse response. Our approach minimizes discrepancies in perceptually relevant acoustic
features, such as energy decay and frequency-dependent reverberation times. Experimental results demonstrate that the learned SDN
configurations significantly improve the accuracy of synthetic reverberation, highlighting the potential of data-driven room acoustic modeling.
Download Differentiable Attenuation Filters for Feedback Delay Networks We introduce a novel method for designing attenuation filters in
digital audio reverberation systems based on Feedback Delay Networks (FDNs). Our approach uses Second Order Sections (SOS)
of Infinite Impulse Response (IIR) filters arranged as parametric
equalizers (PEQ), enabling fine control over frequency-dependent
reverberation decay. Unlike traditional graphic equalizer designs,
which require numerous filters per delay line, we propose a scalable solution where the number of filters can be adjusted. The frequency, gain, and quality factor (Q) parameters are shared parameters across delay lines and only the gain is adjusted based on delay
length. This design not only reduces the number of optimization
parameters, but also remains fully differentiable and compatible
with gradient-based learning frameworks. Leveraging principles
of analog filter design, our method allows for efficient and accurate filter fitting using supervised learning. Our method delivers
a flexible and differentiable design, achieving state-of-the-art performance while significantly reducing computational cost.
Download Perceptual Decorrelator Based on Resonators Decorrelation filters transform mono audio into multiple decorrelated copies. This paper introduces a novel decorrelation filter design based on a resonator bank, which produces a sum of over a thousand exponentially decaying sinusoids. A headphone listening test was used to identify the minimum inter-channel time delays that perceptually match ERB-filtered coherent noise to corresponding incoherent noise. The decay rate of each resonator is set based on a group delay profile determined by the listening test results at its corresponding frequency. Furthermore, the delays from the test are used to refine frequency-dependent windowing in coherence estimation, which we argue represents the perceptually most accurate way of assessing interaural coherence. This coherence measure then guides an optimization process that adjusts the initial phases of the sinusoids to minimize the coherence between two instances of the resonator-based decorrelator. The delay results establish the necessary group delay per ERB for effective decorrelation, revealing higher-than-expected values, particularly at higher frequencies. For comparison, the optimization is also performed using two previously proposed group-delay profiles: one based on the period of the ERB band center frequency and another based on the maximum group-delay limit before introducing smearing. The results indicate that the perceptually informed profile achieves equal decorrelation to the latter profile while smearing less at high frequencies. Overall, optimizing the phase response of the proposed decorrelator yields significantly lower coherence compared to using a random phase.
Download Compression of Head-Related Transfer Functions Using Piecewise Cubic Hermite Interpolation We present a spline-based method for compressing and reconstructing Head-Related Transfer Functions (HRTFs) that preserves perceptual quality. Our approach focuses on the magnitude response and consists of four stages: (1) acquiring minimumphase head-related impulse responses (HRIR), (2) transforming
them into the frequency domain and applying adaptive Wiener
filtering to preserve important spectral features, (3) extracting a
minimal set of control points using derivative-based methods to
identify local maxima and inflection points, and (4) reconstructing
the HRTF using piecewise cubic Hermite interpolation (PCHIP)
over the refined control points. Evaluation on 301 subjects demonstrates that our method achieves an average compression ratio of
4.7:1 with spectral distortion ≤ 1.0 dB in each Equivalent Rectangular Band (ERB). The method preserves binaural cues with a
mean absolute interaural level difference (ILD) error of 0.10 dB.
Our method achieves about three times the compression obtained
with a PCA-based method.
Download Spatializing Screen Readers: Extending VoiceOver via Head-Tracked Binaural Synthesis for User Interface Accessibility Traditional screen-based graphical user interfaces (GUIs) pose significant accessibility challenges for visually impaired users. This
paper demonstrates how existing GUI elements can be translated
into an interactive auditory domain using high-order Ambisonics and inertial sensor-based head tracking, culminating in a realtime binaural rendering over headphones. The proposed system
is designed to spatialize the auditory output from VoiceOver, the
built-in macOS screen reader, aiming to foster clearer mental mapping and enhanced navigability.
A between-groups experiment
was conducted to compare standard VoiceOver with the proposed
spatialized version. Non visually-impaired participants (n = 32),
with no visual access to the test interface, completed a list-based
exploration and then attempted to reconstruct the UI solely from
auditory cues. Experimental results indicate that the head-tracked
group achieved a slightly higher accuracy in reconstructing the interface, while user experience assessments showed no significant
differences in self-reported workload or usability. These findings
suggest that potential benefits may come from the integration of
head-tracked binaural audio into mainstream screen-reader workflows, but future investigations involving blind and low-vision users
are needed.
Although the experimental testbed uses a generic
desktop app, our ultimate goal is to tackle the complex visual layouts of music-production software, where an head-tracked audio
approach could benefit visually impaired producers and musicians
navigating plug-in controls.
Download Evaluating the Performance of Objective Audio Quality Metrics in Response to Common Audio Degradations This study evaluates the performance of five objective audio quality metrics—PEAQ Basic, PEAQ Advanced, PEMO-Q, ViSQOL,
and HAAQI —in the context of digital music production. Unlike
previous comparisons, we focus on their suitability for production environments, an area currently underexplored in existing research. Twelve audio examples were tested using two evaluation
types: an effectiveness test under progressively increasing degradations (hum, hiss, clipping, glitches) and a robustness test under
fixed-level, randomly fluctuating degradations.
In the effectiveness test, HAAQI, PEMO-Q, and PEAQ Basic
effectively tracked degradation changes, while PEAQ Advanced
failed consistently and ViSQOL showed low sensitivity to hum
and glitches. In the robustness test, ViSQOL and HAAQI demonstrated the highest consistency, with average standard deviations
of 0.004 and 0.007, respectively, followed by PEMO-Q (0.021),
PEAQ Basic (0.057), and PEAQ Advanced (0.065).
However,
ViSQOL also showed low variability across audio examples, suggesting limited genre sensitivity.
These findings highlight the strengths and limitations of each
metric for music production, specifically quality measurement with
compressed audio. The source code and dataset will be made publicly available upon publication.
Download Room Acoustic Modelling Using a Hybrid Ray-Tracing/Feedback Delay Network Method Combining different room acoustic modelling methods could provide a better balance between perceptual plausibility and computational efficiency than using a single and potentially more computationally expensive model. In this work, a hybrid acoustic modelling system that integrates ray tracing (RT) with an advanced
feedback delay network (FDN) is designed to generate perceptually plausible RIRs. A multiple stimuli with hidden reference
and anchor (MUSHRA) test and a two-alternative-forced-choice
(2AFC) discrimination task have been conducted to compare the
proposed method against ground truth recordings and conventional
RT-based approaches. The results show that the proposed system
delivers robust performance in various scenarios, achieving highly
plausible reverberation synthesis.
Download DataRES and PyRES: A Room Dataset and a Python Library for Reverberation Enhancement System Development, Evaluation, and Simulation Reverberation is crucial in the acoustical design of physical
spaces, especially halls for live music performances. Reverberation Enhancement Systems (RESs) are active acoustic systems that
can control the reverberation properties of physical spaces, allowing them to adapt to specific acoustical needs. The performance of
RESs strongly depends on the properties of the physical room and
the architecture of the Digital Signal Processor (DSP). However,
room-impulse-response (RIR) measurements and the DSP code
from previous studies on RESs have never been made open access, leading to non-reproducible results. In this study, we present
DataRES and PyRES—a RIR dataset and a Python library to increase the reproducibility of studies on RESs. The dataset contains RIRs measured in RES research and development rooms and
professional music venues. The library offers classes and functionality for the development, evaluation, and simulation of RESs.
The implemented DSP architectures are made differentiable, allowing their components to be trained in a machine-learning-like
pipeline. The replication of previous studies by the authors shows
that PyRES can become a useful tool in future research on RESs.
Download Auditory Discrimination of Early Reflections in Virtual Rooms This study investigates the perceptual sensitivity to early reflection changes across different spatial directions in a virtual
reality (VR) environment. Using an ABX discrimination paradigm, participants evaluated speech stimuli convolved with thirdorder Ambisonic room impulse responses under three position
reversal (Left–Right, Front–Back, and Floor–Ceiling) and three
reverberation conditions (RT60 = 1.0 s, 0.6 s, and 0.2 s). Binomial tests revealed that participants consistently detected early reflection differences in the Left–Right reversal, while discrimination performance in the other two directions remained at or near
chance. This result can be explained by the higher acuity and
lower localisation blur found for the human auditory system. A
two-way ANOVA confirmed a significant main effect of spatial
position (p = 0.00685, η² = 0.1605), with no significant effect of
reverberation or interaction. The analysis of the binaural room
impulse responses showed wave forms and Direct-ReverberantRatio differences in the Left–Right reversal position, aligning
with perceptual results. However, no definitive causal link between DRR variations and perceptual outcomes can yet be established.
Download Biquad Coefficients Optimization via Kolmogorov-Arnold Networks Conventional Deep Learning (DL) approaches to Infinite Impulse
Response (IIR) filter coefficients estimation from arbitrary frequency response are quite limited. They often suffer from inefficiencies such as tight training requirements, high complexity, and
limited accuracy. As an alternative, in this paper, we explore the
use of Kolmogorov-Arnold Networks (KANs) to predict the IIR
filter—specifically biquad coefficients—effectively. By leveraging the high interpretability and accuracy of KANs, we achieve
smooth coefficients’ optimization. Furthermore, by constraining
the search space and exploring different loss functions, we demonstrate improved performance in speed and accuracy. Our approach
is evaluated against other existing differentiable IIR filter solutions. The results show significant advantages of KANs over existing methods, offering steadier convergences and more accurate
results. This offers new possibilities for integrating digital infinite
impulse response (IIR) filters into deep-learning frameworks.