Download TorchFX: A Modern Approach to Audio DSP with PyTorch and GPU Acceleration The increasing complexity and real-time processing demands of
audio signals require optimized algorithms that utilize the computational power of Graphics Processing Units (GPUs).
Existing Digital Signal Processing (DSP) libraries often do not provide
the necessary efficiency and flexibility, particularly for integrating
with Artificial Intelligence (AI) models. In response, we introduce TorchFX: a GPU-accelerated Python library for DSP, engineered to facilitate sophisticated audio signal processing. Built on
the PyTorch framework, TorchFX offers an Object-Oriented interface similar to torchaudio but enhances functionality with a novel
pipe operator for intuitive filter chaining. The library provides a
comprehensive suite of Finite Impulse Response (FIR) and Infinite Impulse Response (IIR) filters, with a focus on multichannel
audio, thereby facilitating the integration of DSP and AI-based
approaches.
Our benchmarking results demonstrate significant
efficiency gains over traditional libraries like SciPy, particularly
in multichannel contexts. While there are current limitations in
GPU compatibility, ongoing developments promise broader support and real-time processing capabilities. TorchFX aims to become a useful tool for the community, contributing to innovation
in GPU-accelerated DSP. TorchFX is publicly available on GitHub
at https://github.com/matteospanio/torchfx.
Download Simplifying Antiderivative Antialiasing with Lookup Table Integration Antiderivative Antialiasing (ADAA), has become a pivotal method
for reducing aliasing when dealing with nonlinear function at audio rate. However, its implementation requires analytical computation of the antiderivative of the nonlinear function, which in practical cases can be challenging without a symbolic solver. Moreover, when the nonlinear function is given by measurements it
must be approximated to get a symbolic description. In this paper, we propose a simple approach to ADAA for practical applications that employs numerical integration of lookup tables (LUTs)
to approximate the antiderivative. This method eliminates the need
for closed-form solutions, streamlining the ADAA implementation
process in industrial applications. We analyze the trade-offs of this
approach, highlighting its computational efficiency and ease of implementation while discussing the potential impact of numerical
integration errors on aliasing performance. Experiments are conducted with static nonlinearities (tanh, a simple wavefolder and
the Buchla 259 wavefolding circuit) and a stateful nonlinear system (the diode clipper).
Download Differentiable Scattering Delay Networks for Artificial Reverberation Scattering delay networks (SDNs) provide a flexible and efficient
framework for artificial reverberation and room acoustic modeling. In this work, we introduce a differentiable SDN, enabling
gradient-based optimization of its parameters to better approximate the acoustics of real-world environments. By formulating
key parameters such as scattering matrices and absorption filters
as differentiable functions, we employ gradient descent to optimize an SDN based on a target room impulse response. Our approach minimizes discrepancies in perceptually relevant acoustic
features, such as energy decay and frequency-dependent reverberation times. Experimental results demonstrate that the learned SDN
configurations significantly improve the accuracy of synthetic reverberation, highlighting the potential of data-driven room acoustic modeling.
Download Room Acoustic Modelling Using a Hybrid Ray-Tracing/Feedback Delay Network Method Combining different room acoustic modelling methods could provide a better balance between perceptual plausibility and computational efficiency than using a single and potentially more computationally expensive model. In this work, a hybrid acoustic modelling system that integrates ray tracing (RT) with an advanced
feedback delay network (FDN) is designed to generate perceptually plausible RIRs. A multiple stimuli with hidden reference
and anchor (MUSHRA) test and a two-alternative-forced-choice
(2AFC) discrimination task have been conducted to compare the
proposed method against ground truth recordings and conventional
RT-based approaches. The results show that the proposed system
delivers robust performance in various scenarios, achieving highly
plausible reverberation synthesis.
Download Automatic Classification of Chains of Guitar Effects Through Evolutionary Neural Architecture Search Recent studies on classifying electric guitar effects have achieved
high accuracy, particularly with deep learning techniques. However, these studies often rely on simplified datasets consisting
mainly of single notes rather than realistic guitar recordings.
Moreover, in the specific field of effect chain estimation, the literature tends to rely on large models, making them impractical for
real-time or resource-constrained applications. In this work, we
recorded realistic guitar performances using four different guitars
and created three datasets by applying a chain of five effects with
increasing complexity: (1) fixed order and parameters, (2) fixed order with randomly sampled parameters, and (3) random order and
parameters. We also propose a novel Neural Architecture Search
method aimed at discovering accurate yet compact convolutional
neural network models to reduce power and memory consumption.
We compared its performance to a basic random search strategy,
showing that our custom Neural Architecture Search outperformed
random search in identifying models that balance accuracy and
complexity. We found that the number of convolutional and pooling layers becomes increasingly important as dataset complexity
grows, while dense layers have less impact. Additionally, among
the effects, tremolo was identified as the most challenging to classify.
Download Neural-Driven Multi-Band Processing for Automatic Equalization and Style Transfer We present a Neural-Driven Multi-Band Processor (NDMP), a differentiable audio processing framework that augments a static sixband Parametric Equalizer (PEQ) with per-band dynamic range
compression. We optimize this processor using neural inference
for two tasks: Automatic Equalization (AutoEQ), which estimates
tonal and dynamic corrections without a reference, and Production
Style Transfer (NDMP-ST), which adapts the processing of an input signal to match the tonal and dynamic characteristics of a reference. We train NDMP using a self-supervised strategy, where the
model learns to recover a clean signal from inputs degraded with
randomly sampled NDMP parameters and gain adjustments. This
setup eliminates the need for paired input–target data and enables
end-to-end training with audio-domain loss functions. In the inference, AutoEQ enhances previously unseen inputs in a blind setting, while NDMP-ST performs style transfer by predicting taskspecific processing parameters. We evaluate our approach on the
MUSDB18 dataset using both objective metrics (e.g., SI-SDR,
PESQ, STFT loss) and a listening test.
Our results show that
NDMP consistently outperforms traditional PEQ and a PEQ+DRC
(single-band) baseline, offering a robust neural framework for audio enhancement that combines learned spectral and dynamic control.
Download Antiderivative Antialiasing for Recurrent Neural Networks Neural networks have become invaluable for general audio processing tasks, such as virtual analog modeling of nonlinear audio equipment.
For sequence modeling tasks in particular, recurrent neural networks (RNNs) have gained widespread adoption in recent years. Their general applicability and effectiveness
stems partly from their inherent nonlinearity, which makes them
prone to aliasing. Recent work has explored mitigating aliasing
by oversampling the network—an approach whose effectiveness is
directly linked with the incurred computational costs. This work
explores an alternative route by extending the antiderivative antialiasing technique to explicit, computable RNNs. Detailed applications to the Gated Recurrent Unit and Long Short-Term Memory cell are shown as case studies. The proposed technique is evaluated
on multiple pre-trained guitar amplifier models, assessing its impact on the amount of aliasing and model tonality. The method is
shown to reduce the models’ tendency to alias considerably across
all considered sample rates while only affecting their tonality moderately, without requiring high oversampling factors. The results
of this study can be used to improve sound quality in neural audio
processing tasks that employ a suitable class of RNNs. Additional
materials are provided in the accompanying webpage.
Download Unsupervised Estimation of Nonlinear Audio Effects: Comparing Diffusion-Based and Adversarial Approaches Accurately estimating nonlinear audio effects without access to
paired input-output signals remains a challenging problem. This
work studies unsupervised probabilistic approaches for solving this
task. We introduce a method, novel for this application, based
on diffusion generative models for blind system identification, enabling the estimation of unknown nonlinear effects using blackand gray-box models. This study compares this method with a
previously proposed adversarial approach, analyzing the performance of both methods under different parameterizations of the
effect operator and varying lengths of available effected recordings. Through experiments on guitar distortion effects, we show
that the diffusion-based approach provides more stable results and
is less sensitive to data availability, while the adversarial approach
is superior at estimating more pronounced distortion effects. Our
findings contribute to the robust unsupervised blind estimation of
audio effects, demonstrating the potential of diffusion models for
system identification in music technology.
Download Empirical Results for Adjusting Truncated Backpropagation Through Time While Training Neural Audio Effects This paper investigates the optimization of Truncated Backpropagation Through Time (TBPTT) for training neural networks in
digital audio effect modeling, with a focus on dynamic range compression. The study evaluates key TBPTT hyperparameters – sequence number, batch size, and sequence length – and their influence on model performance. Using a convolutional-recurrent architecture, we conduct extensive experiments across datasets with
and without conditioning by user controls. Results demonstrate
that carefully tuning these parameters enhances model accuracy
and training stability, while also reducing computational demands.
Objective evaluations confirm improved performance with optimized settings, while subjective listening tests indicate that the
revised TBPTT configuration maintains high perceptual quality.
Download Evaluating the Performance of Objective Audio Quality Metrics in Response to Common Audio Degradations This study evaluates the performance of five objective audio quality metrics—PEAQ Basic, PEAQ Advanced, PEMO-Q, ViSQOL,
and HAAQI —in the context of digital music production. Unlike
previous comparisons, we focus on their suitability for production environments, an area currently underexplored in existing research. Twelve audio examples were tested using two evaluation
types: an effectiveness test under progressively increasing degradations (hum, hiss, clipping, glitches) and a robustness test under
fixed-level, randomly fluctuating degradations.
In the effectiveness test, HAAQI, PEMO-Q, and PEAQ Basic
effectively tracked degradation changes, while PEAQ Advanced
failed consistently and ViSQOL showed low sensitivity to hum
and glitches. In the robustness test, ViSQOL and HAAQI demonstrated the highest consistency, with average standard deviations
of 0.004 and 0.007, respectively, followed by PEMO-Q (0.021),
PEAQ Basic (0.057), and PEAQ Advanced (0.065).
However,
ViSQOL also showed low variability across audio examples, suggesting limited genre sensitivity.
These findings highlight the strengths and limitations of each
metric for music production, specifically quality measurement with
compressed audio. The source code and dataset will be made publicly available upon publication.