Download GPGPU Audio Benchmark Framework
Acceleration of audio workloads on generally-programmable GPU (GPGPU) hardware offers potentially high speedup factors, but also presents challenges in terms of development and deployment. We can increasingly depend on such hardware being available in users’ systems, yet few real-time audio products use this resource. We propose a suite of benchmarks to qualify a GPU as suitable for batch or real-time audio processing. This includes both microbenchmarks and higher-level audio domain benchmarks. We choose metrics based on application, paying particularly close attention to latency tail distribution. We propose an extension to the benchmark framework to more accurately simulate the real-world request pattern and performance requirements when running in a digital audio workstation. We run these benchmarks on two common consumer-level platforms: a PC desktop with a recent midrange discrete GPU and a Macintosh desktop with unified CPUGPU memory architecture.
Download Frequency-Dependent Characteristics and Perceptual Validation of the Interaural Thresholded Level Distribution
The interaural thresholded level distribution (ITLD) is a novel metric of auditory source width (ASW), derived from the psychophysical processes and structures of the inner ear. While several of the ITLD’s objective properties have been presented in previous work, its frequency-dependent characteristics and perceptual relationship with ASW have not been previously explored. This paper presents an investigation into these properties of the ITLD, which exhibits pronounced variation in band-limited behaviour as octaveband centre-frequency is increased. Additionally, a very strong correlation was found between [1 – ITLD] and normalised values of ASW, collected from a semantic differential listening test based on the Multiple Stimulus with Hidden Reference and Anchor (MUSHRA) framework. Perceptual relationships between various ITLD-derived quantities were also investigated, showing that the low-pass filter intrinsic to ITLD calculation strengthened the relationship between [1 – ITLD] and ASW. A subsequent test using transient stimuli, as well as investigations into other psychoacoustic properties of the metric such as its just-noticeabledifference, were outlined as subjects for future research, to gain a deeper understanding of the subjective properties of the ITLD.
Download Differentiable All-Pole Filters for Time-Varying Audio Systems
Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers. However, their recursive structure impedes end-toend training of these systems using automatic differentiation. Although non-recursive filter approximations like frequency sampling and frame-based processing have been proposed and widely used in previous works, they cannot accurately reflect the gradient of the original system. We alleviate this difficulty by reexpressing a time-varying all-pole filter to backpropagate the gradients through itself, so the filter implementation is not bound to the technical limitations of automatic differentiation frameworks. This implementation can be employed within audio systems containing filters with poles for efficient gradient evaluation. We demonstrate its training efficiency and expressive capabilities for modelling real-world dynamic audio systems on a phaser, time-varying subtractive synthesiser, and feed-forward compressor. We make our code and audio samples available and provide the trained audio effect and synth models in a VST plugin1 .
Download A Diffusion-Based Generative Equalizer for Music Restoration
This paper presents a novel approach to audio restoration, focusing on the enhancement of low-quality music recordings, and in particular historical ones. Building upon a previous algorithm called BABE, or Blind Audio Bandwidth Extension, we introduce BABE-2, which presents a series of improvements. This research broadens the concept of bandwidth extension to generative equalization, a task that, to the best of our knowledge, has not been previously addressed for music restoration. BABE-2 is built around an optimization algorithm utilizing priors from diffusion models, which are trained or fine-tuned using a curated set of high-quality music tracks. The algorithm simultaneously performs two critical tasks: estimation of the filter degradation magnitude response and hallucination of the restored audio. The proposed method is objectively evaluated on historical piano recordings, showing an enhancement over the prior version. The method yields similarly impressive results in rejuvenating the works of renowned vocalists Enrico Caruso and Nellie Melba. This research represents an advancement in the practical restoration of historical music. Historical music restoration examples are available at: research.spa.aalto.fi/publications/papers/dafx-babe2/.