Download Differentiable Scattering Delay Networks for Artificial Reverberation
Scattering delay networks (SDNs) provide a flexible and efficient framework for artificial reverberation and room acoustic modeling. In this work, we introduce a differentiable SDN, enabling gradient-based optimization of its parameters to better approximate the acoustics of real-world environments. By formulating key parameters such as scattering matrices and absorption filters as differentiable functions, we employ gradient descent to optimize an SDN based on a target room impulse response. Our approach minimizes discrepancies in perceptually relevant acoustic features, such as energy decay and frequency-dependent reverberation times. Experimental results demonstrate that the learned SDN configurations significantly improve the accuracy of synthetic reverberation, highlighting the potential of data-driven room acoustic modeling.
Download Differentiable Attenuation Filters for Feedback Delay Networks
We introduce a novel method for designing attenuation filters in digital audio reverberation systems based on Feedback Delay Networks (FDNs). Our approach uses Second Order Sections (SOS) of Infinite Impulse Response (IIR) filters arranged as parametric equalizers (PEQ), enabling fine control over frequency-dependent reverberation decay. Unlike traditional graphic equalizer designs, which require numerous filters per delay line, we propose a scalable solution where the number of filters can be adjusted. The frequency, gain, and quality factor (Q) parameters are shared parameters across delay lines and only the gain is adjusted based on delay length. This design not only reduces the number of optimization parameters, but also remains fully differentiable and compatible with gradient-based learning frameworks. Leveraging principles of analog filter design, our method allows for efficient and accurate filter fitting using supervised learning. Our method delivers a flexible and differentiable design, achieving state-of-the-art performance while significantly reducing computational cost.
Download Perceptual Decorrelator Based on Resonators
Decorrelation filters transform mono audio into multiple decorrelated copies. This paper introduces a novel decorrelation filter design based on a resonator bank, which produces a sum of over a thousand exponentially decaying sinusoids. A headphone listening test was used to identify the minimum inter-channel time delays that perceptually match ERB-filtered coherent noise to corresponding incoherent noise. The decay rate of each resonator is set based on a group delay profile determined by the listening test results at its corresponding frequency. Furthermore, the delays from the test are used to refine frequency-dependent windowing in coherence estimation, which we argue represents the perceptually most accurate way of assessing interaural coherence. This coherence measure then guides an optimization process that adjusts the initial phases of the sinusoids to minimize the coherence between two instances of the resonator-based decorrelator. The delay results establish the necessary group delay per ERB for effective decorrelation, revealing higher-than-expected values, particularly at higher frequencies. For comparison, the optimization is also performed using two previously proposed group-delay profiles: one based on the period of the ERB band center frequency and another based on the maximum group-delay limit before introducing smearing. The results indicate that the perceptually informed profile achieves equal decorrelation to the latter profile while smearing less at high frequencies. Overall, optimizing the phase response of the proposed decorrelator yields significantly lower coherence compared to using a random phase.
Download Compression of Head-Related Transfer Functions Using Piecewise Cubic Hermite Interpolation
We present a spline-based method for compressing and reconstructing Head-Related Transfer Functions (HRTFs) that preserves perceptual quality. Our approach focuses on the magnitude response and consists of four stages: (1) acquiring minimumphase head-related impulse responses (HRIR), (2) transforming them into the frequency domain and applying adaptive Wiener filtering to preserve important spectral features, (3) extracting a minimal set of control points using derivative-based methods to identify local maxima and inflection points, and (4) reconstructing the HRTF using piecewise cubic Hermite interpolation (PCHIP) over the refined control points. Evaluation on 301 subjects demonstrates that our method achieves an average compression ratio of 4.7:1 with spectral distortion ≤ 1.0 dB in each Equivalent Rectangular Band (ERB). The method preserves binaural cues with a mean absolute interaural level difference (ILD) error of 0.10 dB. Our method achieves about three times the compression obtained with a PCA-based method.
Download Spatializing Screen Readers: Extending VoiceOver via Head-Tracked Binaural Synthesis for User Interface Accessibility
Traditional screen-based graphical user interfaces (GUIs) pose significant accessibility challenges for visually impaired users. This paper demonstrates how existing GUI elements can be translated into an interactive auditory domain using high-order Ambisonics and inertial sensor-based head tracking, culminating in a realtime binaural rendering over headphones. The proposed system is designed to spatialize the auditory output from VoiceOver, the built-in macOS screen reader, aiming to foster clearer mental mapping and enhanced navigability. A between-groups experiment was conducted to compare standard VoiceOver with the proposed spatialized version. Non visually-impaired participants (n = 32), with no visual access to the test interface, completed a list-based exploration and then attempted to reconstruct the UI solely from auditory cues. Experimental results indicate that the head-tracked group achieved a slightly higher accuracy in reconstructing the interface, while user experience assessments showed no significant differences in self-reported workload or usability. These findings suggest that potential benefits may come from the integration of head-tracked binaural audio into mainstream screen-reader workflows, but future investigations involving blind and low-vision users are needed. Although the experimental testbed uses a generic desktop app, our ultimate goal is to tackle the complex visual layouts of music-production software, where an head-tracked audio approach could benefit visually impaired producers and musicians navigating plug-in controls.
Download Evaluating the Performance of Objective Audio Quality Metrics in Response to Common Audio Degradations
This study evaluates the performance of five objective audio quality metrics—PEAQ Basic, PEAQ Advanced, PEMO-Q, ViSQOL, and HAAQI —in the context of digital music production. Unlike previous comparisons, we focus on their suitability for production environments, an area currently underexplored in existing research. Twelve audio examples were tested using two evaluation types: an effectiveness test under progressively increasing degradations (hum, hiss, clipping, glitches) and a robustness test under fixed-level, randomly fluctuating degradations. In the effectiveness test, HAAQI, PEMO-Q, and PEAQ Basic effectively tracked degradation changes, while PEAQ Advanced failed consistently and ViSQOL showed low sensitivity to hum and glitches. In the robustness test, ViSQOL and HAAQI demonstrated the highest consistency, with average standard deviations of 0.004 and 0.007, respectively, followed by PEMO-Q (0.021), PEAQ Basic (0.057), and PEAQ Advanced (0.065). However, ViSQOL also showed low variability across audio examples, suggesting limited genre sensitivity. These findings highlight the strengths and limitations of each metric for music production, specifically quality measurement with compressed audio. The source code and dataset will be made publicly available upon publication.
Download Room Acoustic Modelling Using a Hybrid Ray-Tracing/Feedback Delay Network Method
Combining different room acoustic modelling methods could provide a better balance between perceptual plausibility and computational efficiency than using a single and potentially more computationally expensive model. In this work, a hybrid acoustic modelling system that integrates ray tracing (RT) with an advanced feedback delay network (FDN) is designed to generate perceptually plausible RIRs. A multiple stimuli with hidden reference and anchor (MUSHRA) test and a two-alternative-forced-choice (2AFC) discrimination task have been conducted to compare the proposed method against ground truth recordings and conventional RT-based approaches. The results show that the proposed system delivers robust performance in various scenarios, achieving highly plausible reverberation synthesis.
Download DataRES and PyRES: A Room Dataset and a Python Library for Reverberation Enhancement System Development, Evaluation, and Simulation
Reverberation is crucial in the acoustical design of physical spaces, especially halls for live music performances. Reverberation Enhancement Systems (RESs) are active acoustic systems that can control the reverberation properties of physical spaces, allowing them to adapt to specific acoustical needs. The performance of RESs strongly depends on the properties of the physical room and the architecture of the Digital Signal Processor (DSP). However, room-impulse-response (RIR) measurements and the DSP code from previous studies on RESs have never been made open access, leading to non-reproducible results. In this study, we present DataRES and PyRES—a RIR dataset and a Python library to increase the reproducibility of studies on RESs. The dataset contains RIRs measured in RES research and development rooms and professional music venues. The library offers classes and functionality for the development, evaluation, and simulation of RESs. The implemented DSP architectures are made differentiable, allowing their components to be trained in a machine-learning-like pipeline. The replication of previous studies by the authors shows that PyRES can become a useful tool in future research on RESs.
Download Auditory Discrimination of Early Reflections in Virtual Rooms
This study investigates the perceptual sensitivity to early reflection changes across different spatial directions in a virtual reality (VR) environment. Using an ABX discrimination paradigm, participants evaluated speech stimuli convolved with thirdorder Ambisonic room impulse responses under three position reversal (Left–Right, Front–Back, and Floor–Ceiling) and three reverberation conditions (RT60 = 1.0 s, 0.6 s, and 0.2 s). Binomial tests revealed that participants consistently detected early reflection differences in the Left–Right reversal, while discrimination performance in the other two directions remained at or near chance. This result can be explained by the higher acuity and lower localisation blur found for the human auditory system. A two-way ANOVA confirmed a significant main effect of spatial position (p = 0.00685, η² = 0.1605), with no significant effect of reverberation or interaction. The analysis of the binaural room impulse responses showed wave forms and Direct-ReverberantRatio differences in the Left–Right reversal position, aligning with perceptual results. However, no definitive causal link between DRR variations and perceptual outcomes can yet be established.
Download Biquad Coefficients Optimization via Kolmogorov-Arnold Networks
Conventional Deep Learning (DL) approaches to Infinite Impulse Response (IIR) filter coefficients estimation from arbitrary frequency response are quite limited. They often suffer from inefficiencies such as tight training requirements, high complexity, and limited accuracy. As an alternative, in this paper, we explore the use of Kolmogorov-Arnold Networks (KANs) to predict the IIR filter—specifically biquad coefficients—effectively. By leveraging the high interpretability and accuracy of KANs, we achieve smooth coefficients’ optimization. Furthermore, by constraining the search space and exploring different loss functions, we demonstrate improved performance in speed and accuracy. Our approach is evaluated against other existing differentiable IIR filter solutions. The results show significant advantages of KANs over existing methods, offering steadier convergences and more accurate results. This offers new possibilities for integrating digital infinite impulse response (IIR) filters into deep-learning frameworks.