Download Automatic Classification of Chains of Guitar Effects Through Evolutionary Neural Architecture Search Recent studies on classifying electric guitar effects have achieved
high accuracy, particularly with deep learning techniques. However, these studies often rely on simplified datasets consisting
mainly of single notes rather than realistic guitar recordings.
Moreover, in the specific field of effect chain estimation, the literature tends to rely on large models, making them impractical for
real-time or resource-constrained applications. In this work, we
recorded realistic guitar performances using four different guitars
and created three datasets by applying a chain of five effects with
increasing complexity: (1) fixed order and parameters, (2) fixed order with randomly sampled parameters, and (3) random order and
parameters. We also propose a novel Neural Architecture Search
method aimed at discovering accurate yet compact convolutional
neural network models to reduce power and memory consumption.
We compared its performance to a basic random search strategy,
showing that our custom Neural Architecture Search outperformed
random search in identifying models that balance accuracy and
complexity. We found that the number of convolutional and pooling layers becomes increasingly important as dataset complexity
grows, while dense layers have less impact. Additionally, among
the effects, tremolo was identified as the most challenging to classify.
Download Neural-Driven Multi-Band Processing for Automatic Equalization and Style Transfer We present a Neural-Driven Multi-Band Processor (NDMP), a differentiable audio processing framework that augments a static sixband Parametric Equalizer (PEQ) with per-band dynamic range
compression. We optimize this processor using neural inference
for two tasks: Automatic Equalization (AutoEQ), which estimates
tonal and dynamic corrections without a reference, and Production
Style Transfer (NDMP-ST), which adapts the processing of an input signal to match the tonal and dynamic characteristics of a reference. We train NDMP using a self-supervised strategy, where the
model learns to recover a clean signal from inputs degraded with
randomly sampled NDMP parameters and gain adjustments. This
setup eliminates the need for paired input–target data and enables
end-to-end training with audio-domain loss functions. In the inference, AutoEQ enhances previously unseen inputs in a blind setting, while NDMP-ST performs style transfer by predicting taskspecific processing parameters. We evaluate our approach on the
MUSDB18 dataset using both objective metrics (e.g., SI-SDR,
PESQ, STFT loss) and a listening test.
Our results show that
NDMP consistently outperforms traditional PEQ and a PEQ+DRC
(single-band) baseline, offering a robust neural framework for audio enhancement that combines learned spectral and dynamic control.
Download Real-Time Implementation of the Dynamic Stiff String Using Finite-Difference Time-Domain Methods and the Dynamic Grid Digital musical instruments based on physical modelling have gained increased popularity over the past years. This is partly due to recent advances in computational power, which allow for their real-time implementation. One of the great potentials for digital musical instruments based on physical models, is that one can go beyond what is physically possible and change properties of the instruments which are static in real life. This paper presents a real-time implementation of the dynamic stiff string using finitedifference time-domain (FDTD) methods. The defining parameters of the string can be varied in real time and change the underlying grid that these methods rely on based on the recently developed dynamic grid method. For most settings, parameter changes are nearly instantaneous and do not cause noticeable artefacts due to changes in the grid. A reliable way to prevent artefacts for all settings is under development.
Download Physical Modeling Using Recurrent Neural Networks with Fast Convolutional Layers Discrete-time modeling of acoustic, mechanical and electrical systems is a prominent topic in the musical signal processing literature. Such models are mostly derived by discretizing a mathematical model, given in terms of ordinary or partial differential equations, using established techniques. Recent work has applied the techniques of machine-learning to construct such models automatically from data for the case of systems which have lumped states described by scalar values, such as electrical circuits. In this work, we examine how similar techniques are able to construct models of systems which have spatially distributed rather than lumped states. We describe several novel recurrent neural network structures, and show how they can be thought of as an extension of modal techniques. As a proof of concept, we generate synthetic data for three physical systems and show that the proposed network structures can be trained with this data to reproduce the behavior of these systems.
Download Differentiable Piano Model for Midi-to-Audio Performance Synthesis Recent neural-based synthesis models have achieved impressive results for musical instrument sound generation. In particular, the Differentiable Digital Signal Processing (DDSP) framework enables the usage of spectral modeling analysis and synthesis techniques in fully differentiable architectures. Yet currently, it has only been used for modeling monophonic instruments. Leveraging the interpretability and modularity of this framework, the present work introduces a polyphonic differentiable model for piano sound synthesis, conditioned on Musical Instrument Digital Interface (MIDI) inputs. The model architecture is motivated by high-level acoustic modeling knowledge of the instrument which, in tandem with the sound structure priors inherent to the DDSP components, makes for a lightweight, interpretable and realistic sounding piano model. The proposed model has been evaluated in a listening test, demonstrating improved sound quality compared to a benchmark neural-based piano model, with significantly less parameters and even with reduced training data. The same listening test indicates that physical-modeling-based models still achieve better quality, but the differentiability of our lightened approach encourages its usage in other musical tasks dealing with polyphonic audio and symbolic data.
Download Differentiable grey-box modelling of phaser effects using frame-based spectral processing Machine learning approaches to modelling analog audio effects have seen intensive investigation in recent years, particularly in the context of non-linear time-invariant effects such as guitar amplifiers. For modulation effects such as phasers, however, new challenges emerge due to the presence of the low-frequency oscillator which controls the slowly time-varying nature of the effect. Existing approaches have either required foreknowledge of this control signal, or have been non-causal in implementation. This work presents a differentiable digital signal processing approach to modelling phaser effects in which the underlying control signal and time-varying spectral response of the effect are jointly learned. The proposed model processes audio in short frames to implement a time-varying filter in the frequency domain, with a transfer function based on typical analog phaser circuit topology. We show that the model can be trained to emulate an analog reference device, while retaining interpretable and adjustable parameters. The frame duration is an important hyper-parameter of the proposed model, so an investigation was carried out into its effect on model accuracy. The optimal frame length depends on both the rate and transient decay-time of the target effect, but the frame length can be altered at inference time without a significant change in accuracy.
Download Upcylcing Android Phones into Embedded Audio Platforms There are millions of sophisticated Android phones in the world that get disposed of at a very high rate due to consumerism. Their computational power and built-in features, instead of being wasted when discarded, could be repurposed for creative applications such as musical instruments and interactive audio installations. However, audio programming on Android is complicated and comes with restrictions that heavily impact performance. To address this issue, we present LDSP, an open-source environment that can be used to easily upcycle Android phones into embedded platforms optimized for audio synthesis and processing. We conducted a benchmark study to compare the number of oscillators that can be run in parallel on LDSP with an equivalent audio app designed according to modern Android standards. Our study tested six phones ranging from 2014 to 2018 and running different Android versions. The results consistently demonstrate that LDSP provides a significant boost in performance, with some cases showing an increase of more than double, making even very old phones suitable for fairly advanced audio applications.
Download Antiderivative Antialiasing for Recurrent Neural Networks Neural networks have become invaluable for general audio processing tasks, such as virtual analog modeling of nonlinear audio equipment.
For sequence modeling tasks in particular, recurrent neural networks (RNNs) have gained widespread adoption in recent years. Their general applicability and effectiveness
stems partly from their inherent nonlinearity, which makes them
prone to aliasing. Recent work has explored mitigating aliasing
by oversampling the network—an approach whose effectiveness is
directly linked with the incurred computational costs. This work
explores an alternative route by extending the antiderivative antialiasing technique to explicit, computable RNNs. Detailed applications to the Gated Recurrent Unit and Long Short-Term Memory cell are shown as case studies. The proposed technique is evaluated
on multiple pre-trained guitar amplifier models, assessing its impact on the amount of aliasing and model tonality. The method is
shown to reduce the models’ tendency to alias considerably across
all considered sample rates while only affecting their tonality moderately, without requiring high oversampling factors. The results
of this study can be used to improve sound quality in neural audio
processing tasks that employ a suitable class of RNNs. Additional
materials are provided in the accompanying webpage.
Download Unsupervised Estimation of Nonlinear Audio Effects: Comparing Diffusion-Based and Adversarial Approaches Accurately estimating nonlinear audio effects without access to
paired input-output signals remains a challenging problem. This
work studies unsupervised probabilistic approaches for solving this
task. We introduce a method, novel for this application, based
on diffusion generative models for blind system identification, enabling the estimation of unknown nonlinear effects using blackand gray-box models. This study compares this method with a
previously proposed adversarial approach, analyzing the performance of both methods under different parameterizations of the
effect operator and varying lengths of available effected recordings. Through experiments on guitar distortion effects, we show
that the diffusion-based approach provides more stable results and
is less sensitive to data availability, while the adversarial approach
is superior at estimating more pronounced distortion effects. Our
findings contribute to the robust unsupervised blind estimation of
audio effects, demonstrating the potential of diffusion models for
system identification in music technology.
Download Continuous State Modeling for Statistical Spectral Synthesis Continuous State Markovian Spectral Modeling is a novel approach for parametric synthesis of spectral modeling parameters, based on the sines plus noise paradigm. The method aims specifically at capturing shimmer and jitter - micro-fluctuations in the partials’ frequency and amplitude trajectories, which are essential for the timbre of musical instruments. It allows for parametric control over the timbral qualities, while removing the need for the more computationally expensive and restrictive process of the discrete state space modeling method. A qualitative comparison between an original violin sound and a re-synthesis shows the ability of the algorithm to reproduce the micro-fluctuations, considering their stochastic and spectral properties.