Download Neural Grey-Box Guitar Amplifier Modelling with Limited Data
This paper combines recurrent neural networks (RNNs) with the discretised Kirchhoff nodal analysis (DK-method) to create a grey-box guitar amplifier model. Both the objective and subjective results suggest that the proposed model is able to outperform a baseline black-box RNN model in the task of modelling a guitar amplifier, including realistically recreating the behaviour of the amplifier equaliser circuit, whilst requiring significantly less training data. Furthermore, we adapt the linear part of the DK-method in a deep learning scenario to derive multiple state-space filters simultaneously. We frequency sample the filter transfer functions in parallel and perform frequency domain filtering to considerably reduce the required training times compared to recursive state-space filtering. This study shows that it is a powerful idea to separately model the linear and nonlinear parts of a guitar amplifier using supervised learning.
Download A Diffusion-Based Generative Equalizer for Music Restoration
This paper presents a novel approach to audio restoration, focusing on the enhancement of low-quality music recordings, and in particular historical ones. Building upon a previous algorithm called BABE, or Blind Audio Bandwidth Extension, we introduce BABE-2, which presents a series of improvements. This research broadens the concept of bandwidth extension to generative equalization, a task that, to the best of our knowledge, has not been previously addressed for music restoration. BABE-2 is built around an optimization algorithm utilizing priors from diffusion models, which are trained or fine-tuned using a curated set of high-quality music tracks. The algorithm simultaneously performs two critical tasks: estimation of the filter degradation magnitude response and hallucination of the restored audio. The proposed method is objectively evaluated on historical piano recordings, showing an enhancement over the prior version. The method yields similarly impressive results in rejuvenating the works of renowned vocalists Enrico Caruso and Nellie Melba. This research represents an advancement in the practical restoration of historical music. Historical music restoration examples are available at: research.spa.aalto.fi/publications/papers/dafx-babe2/.
Download RIR2FDN: An Improved Room Impulse Response Analysis and Synthesis
This paper seeks to improve the state-of-the-art in delay-networkbased analysis-synthesis of measured room impulse responses (RIRs). We propose an informed method incorporating improved energy decay estimation and synthesis with an optimized feedback delay network. The performance of the presented method is compared against an end-to-end deep-learning approach. A formal listening test was conducted where participants assessed the similarity of reverberated material across seven distinct RIRs and three different sound sources. The results reveal that the performance of these methods is influenced by both the excitation sounds and the reverberation conditions. Nonetheless, the proposed method consistently demonstrates higher similarity ratings compared to the end-to-end approach across most conditions. However, achieving an indistinguishable synthesis of measured RIRs remains a persistent challenge, underscoring the complexity of this problem. Overall, this work helps improve the sound quality of analysis-based artificial reverberation.
Download Implementing a Low Latency Parallel Graphic Equalizer with Heterogeneous Computing
This paper describes the implementation of a recently introduced parallel graphic equalizer (PGE) in a heterogeneous way. The control and audio signal processing parts of the PGE are distributed to a PC and to a signal processor, of WaveCore architecture, respectively. This arrangement is particularly suited to the algorithm in question, benefiting from the low-latency characteristics of the audio signal processor as well as general purpose computing power for the more demanding filter coefficient computation. The design is achieved cleanly in a high-level language called Kronos, which we have adapted for the purposes of heterogeneous code generation from a uniform program source.
Download Perceptual Decorrelator Based on Resonators
Decorrelation filters transform mono audio into multiple decorrelated copies. This paper introduces a novel decorrelation filter design based on a resonator bank, which produces a sum of over a thousand exponentially decaying sinusoids. A headphone listening test was used to identify the minimum inter-channel time delays that perceptually match ERB-filtered coherent noise to corresponding incoherent noise. The decay rate of each resonator is set based on a group delay profile determined by the listening test results at its corresponding frequency. Furthermore, the delays from the test are used to refine frequency-dependent windowing in coherence estimation, which we argue represents the perceptually most accurate way of assessing interaural coherence. This coherence measure then guides an optimization process that adjusts the initial phases of the sinusoids to minimize the coherence between two instances of the resonator-based decorrelator. The delay results establish the necessary group delay per ERB for effective decorrelation, revealing higher-than-expected values, particularly at higher frequencies. For comparison, the optimization is also performed using two previously proposed group-delay profiles: one based on the period of the ERB band center frequency and another based on the maximum group-delay limit before introducing smearing. The results indicate that the perceptually informed profile achieves equal decorrelation to the latter profile while smearing less at high frequencies. Overall, optimizing the phase response of the proposed decorrelator yields significantly lower coherence compared to using a random phase.