Download Lookup Table Based Audio Spectral Transformation We present a unified visual interface for flexible spectral audio manipulation based on editable lookup tables (LUTs). In the proposed
approach, the audio spectrum is visualized as a two-dimensional
color map of frequency versus amplitude, serving as an editable
lookup table for modifying the sound. This single tool can replicate common audio effects such as equalization, pitch shifting, and
spectral compression, while also enabling novel sound transformations through creative combinations of adjustments. By consolidating these capabilities into one visual platform, the system has
the potential to streamline audio-editing workflows and encourage
creative experimentation. The approach also supports real-time
processing, providing immediate auditory feedback in an interactive graphical environment. Overall, this LUT-based method offers
an accessible yet powerful framework for designing and applying
a broad range of spectral audio effects through intuitive visual manipulation.
Download Towards an Objective Comparison of Panning Feature Algorithms for Unsupervised Learning Estimations of panning attributes are an important feature to extract from a piece of recorded music, with downstream uses such
as classification, quality assessment, and listening enhancement.
While several algorithms exist in the literature, there is currently
no comparison between them and no studies to suggest which one
is most suitable for any particular task. This paper compares four
algorithms for extracting amplitude panning features with respect
to their suitability for unsupervised learning. It finds synchronicities between them and analyses their results on a small set of
commercial music excerpts chosen for their distinct panning features. The ability of each algorithm to differentiate between the
tracks is analysed. The results can be used in future work to either
select the most appropriate panning feature algorithm or create a
version customized for a particular task.
Download Unsupervised Estimation of Nonlinear Audio Effects: Comparing Diffusion-Based and Adversarial Approaches Accurately estimating nonlinear audio effects without access to
paired input-output signals remains a challenging problem. This
work studies unsupervised probabilistic approaches for solving this
task. We introduce a method, novel for this application, based
on diffusion generative models for blind system identification, enabling the estimation of unknown nonlinear effects using blackand gray-box models. This study compares this method with a
previously proposed adversarial approach, analyzing the performance of both methods under different parameterizations of the
effect operator and varying lengths of available effected recordings. Through experiments on guitar distortion effects, we show
that the diffusion-based approach provides more stable results and
is less sensitive to data availability, while the adversarial approach
is superior at estimating more pronounced distortion effects. Our
findings contribute to the robust unsupervised blind estimation of
audio effects, demonstrating the potential of diffusion models for
system identification in music technology.
Download Antialiased Black-Box Modeling of Audio Distortion Circuits Using Real Linear Recurrent Units In this paper, we propose the use of real-valued Linear Recurrent
Units (LRUs) for black-box modeling of audio circuits. A network architecture composed of real LRU blocks interleaved with
nonlinear processing stages is proposed.
Two case studies are
presented, a second-order diode clipper and an overdrive distortion pedal. Furthermore, we show how to integrate the antiderivative antialiaisng technique into the proposed method, effectively
lowering oversampling requirements. Our experiments show that
the proposed method generates models that accurately capture the
nonlinear dynamics of the examined devices and are highly efficient, which makes them suitable for real-time operation inside
Digital Audio Workstations.
Download Partiels – Exploring, Analyzing and Understanding Sounds This
article
presents
Partiels,
an
open-source
application
developed at IRCAM to analyze digital audio files and explore
sound characteristics.
The application uses Vamp plug-ins to
extract various information on different aspects of the sound, such
as spectrum, partials, pitch, tempo, text, and chords. Partiels is the
successor to AudioSculpt, offering a modern, flexible interface for
visualizing, editing, and exporting analysis results, addressing a
wide range of issues from musicological practice to sound creation
and signal processing research. The article describes Partiels’ key
features, including analysis organization, audio file management,
results visualization and editing, as well as data export and sharing
options, and its interoperability with other software such as Max
and Pure Data. In addition, it highlights the numerous analysis
plug-ins developed at IRCAM, based in particular on machine
learning models, as well as the IRCAM Vamp extension, which
overcomes certain limitations of the original Vamp format.
Download Inference-Time Structured Pruning for Real-Time Neural Network Audio Effects Structured pruning is a technique for reducing the computational
load and memory footprint of neural networks by removing structured subsets of parameters according to a predefined schedule
or ranking criterion.
This paper investigates the application of
structured pruning to real-time neural network audio effects, focusing on both feedforward networks and recurrent architectures.
We evaluate multiple pruning strategies at inference time, without retraining, and analyze their effects on model performance. To
quantify the trade-off between parameter count and audio fidelity,
we construct a theoretical model of the approximation error as a
function of network architecture and pruning level. The resulting bounds establish a principled relationship between pruninginduced sparsity and functional error, enabling informed deployment of neural audio effects in constrained real-time environments.
Download Audio Processor Parameters: Estimating Distributions Instead of Deterministic Values Audio effects and sound synthesizers are widely used processors
in popular music.
Their parameters control the quality of the
output sound. Multiple combinations of parameters can lead to
the same sound.
While recent approaches have been proposed
to estimate these parameters given only the output sound, those
are deterministic, i.e. they only estimate a single solution among
the many possible parameter configurations.
In this work, we
propose to model the parameters as probability distributions instead
of deterministic values. To learn the distributions, we optimize
two objectives: (1) we minimize the reconstruction error between
the ground truth output sound and the one generated using the
estimated parameters, asisit usuallydone, but also(2)we maximize
the parameter diversity, using entropy. We evaluate our approach
through two numerical audio experiments to show its effectiveness.
These results show how our approach effectively outputs multiple
combinations of parameters to match one sound.
Download Neural Sample-Based Piano Synthesis Piano sound emulation has been an active topic of research and development for several decades. Although comprehensive physicsbased piano models have been proposed, sample-based piano emulation is still widely utilized for its computational efficiency and
relative accuracy despite presenting significant memory storage
requirements. This paper proposes a novel hybrid approach to
sample-based piano synthesis aimed at improving the fidelity of
sound emulation while reducing memory requirements for storing samples. A neural network-based model processes the sound
recorded from a single example of piano key at a given velocity.
The network is trained to learn the nonlinear relationship between
the various velocities at which a piano key is pressed and the corresponding sound alterations. Results show that the method achieves
high accuracy using a specific neural architecture that is computationally efficient, presenting few trainable parameters, and it requires memory only for one sample for each piano key.
Download Fast Differentiable Modal Simulation of Non-Linear Strings, Membranes, and Plates Modal methods for simulating vibrations of strings, membranes, and plates are widely used in acoustics and physically
informed audio synthesis. However, traditional implementations,
particularly for non-linear models like the von Kármán plate, are
computationally demanding and lack differentiability, limiting inverse modelling and real-time applications. We introduce a fast,
differentiable, GPU-accelerated modal framework built with the
JAX library, providing efficient simulations and enabling gradientbased inverse modelling.
Benchmarks show that our approach
significantly outperforms CPU and GPU-based implementations,
particularly for simulations with many modes. Inverse modelling
experiments demonstrate that our approach can recover physical
parameters, including tension, stiffness, and geometry, from both
synthetic and experimental data. Although fitting physical parameters is more sensitive to initialisation compared to methods that
fit abstract spectral parameters, it provides greater interpretability
and more compact parameterisation. The code is released as open
source to support future research and applications in differentiable
physical modelling and sound synthesis.
Download Solid State Bus-Comp: A Large-Scale and Diverse Dataset for Dynamic Range Compressor Virtual Analog Modeling Virtual Analog (VA) modeling aims to simulate the behavior
of hardware circuits via algorithms to replicate their tone digitally.
Dynamic Range Compressor (DRC) is an audio processing module
that controls the dynamics of a track by reducing and amplifying
the volumes of loud and quiet sounds, which is essential in music
production. In recent years, neural-network-based VA modeling has
shown great potential in producing high-fidelity models. However,
due to the lack of data quantity and diversity, their generalization
ability in different parameter settings and input sounds is still limited. To tackle this problem, we present Solid State Bus-Comp, the
first large-scale and diverse dataset for modeling the classical VCA
compressor — SSL 500 G-Bus. Specifically, we manually collected
175 unmastered songs from the Cambridge Multitrack Library. We
recorded the compressed audio in 220 parameter combinations,
resulting in an extensive 2528-hour dataset with diverse genres, instruments, tempos, and keys. Moreover, to facilitate the use of our
proposed dataset, we conducted benchmark experiments in various
open-sourced black-box and grey-box models, as well as white-box
plugins. We also conducted ablation studies in different data subsets to illustrate the effectiveness of the improved data diversity and
quantity. The dataset and demos are on our project page: https:
//www.yichenggu.com/SolidStateBusComp/.