Download Solid State Bus-Comp: A Large-Scale and Diverse Dataset for Dynamic Range Compressor Virtual Analog Modeling Virtual Analog (VA) modeling aims to simulate the behavior
of hardware circuits via algorithms to replicate their tone digitally.
Dynamic Range Compressor (DRC) is an audio processing module
that controls the dynamics of a track by reducing and amplifying
the volumes of loud and quiet sounds, which is essential in music
production. In recent years, neural-network-based VA modeling has
shown great potential in producing high-fidelity models. However,
due to the lack of data quantity and diversity, their generalization
ability in different parameter settings and input sounds is still limited. To tackle this problem, we present Solid State Bus-Comp, the
first large-scale and diverse dataset for modeling the classical VCA
compressor — SSL 500 G-Bus. Specifically, we manually collected
175 unmastered songs from the Cambridge Multitrack Library. We
recorded the compressed audio in 220 parameter combinations,
resulting in an extensive 2528-hour dataset with diverse genres, instruments, tempos, and keys. Moreover, to facilitate the use of our
proposed dataset, we conducted benchmark experiments in various
open-sourced black-box and grey-box models, as well as white-box
plugins. We also conducted ablation studies in different data subsets to illustrate the effectiveness of the improved data diversity and
quantity. The dataset and demos are on our project page: https:
//www.yichenggu.com/SolidStateBusComp/.
Download Inference-Time Structured Pruning for Real-Time Neural Network Audio Effects Structured pruning is a technique for reducing the computational
load and memory footprint of neural networks by removing structured subsets of parameters according to a predefined schedule
or ranking criterion.
This paper investigates the application of
structured pruning to real-time neural network audio effects, focusing on both feedforward networks and recurrent architectures.
We evaluate multiple pruning strategies at inference time, without retraining, and analyze their effects on model performance. To
quantify the trade-off between parameter count and audio fidelity,
we construct a theoretical model of the approximation error as a
function of network architecture and pruning level. The resulting bounds establish a principled relationship between pruninginduced sparsity and functional error, enabling informed deployment of neural audio effects in constrained real-time environments.
Download Unsupervised Text-to-Sound Mapping via Embedding Space Alignment This work focuses on developing an artistic tool that performs an
unsupervised mapping between text and sound, converting an input text string into a series of sounds from a given sound corpus.
With the use of a pre-trained sound embedding model and a separate, pre-trained text embedding model, the goal is to find a mapping between the two feature spaces. Our approach is unsupervised which allows any sound corpus to be used with the system.
The tool performs the task of text-to-sound retrieval, creating a
soundfile in which each word in the text input is mapped to a single sound in the corpus, and the resulting sounds are concatenated
to play sequentially. We experiment with three different mapping
methods, and perform quantitative and qualitative evaluations on
the outputs. Our results demonstrate the potential of unsupervised
methods for creative applications in text-to-sound mapping.
Download Differentiable Attenuation Filters for Feedback Delay Networks We introduce a novel method for designing attenuation filters in
digital audio reverberation systems based on Feedback Delay Networks (FDNs). Our approach uses Second Order Sections (SOS)
of Infinite Impulse Response (IIR) filters arranged as parametric
equalizers (PEQ), enabling fine control over frequency-dependent
reverberation decay. Unlike traditional graphic equalizer designs,
which require numerous filters per delay line, we propose a scalable solution where the number of filters can be adjusted. The frequency, gain, and quality factor (Q) parameters are shared parameters across delay lines and only the gain is adjusted based on delay
length. This design not only reduces the number of optimization
parameters, but also remains fully differentiable and compatible
with gradient-based learning frameworks. Leveraging principles
of analog filter design, our method allows for efficient and accurate filter fitting using supervised learning. Our method delivers
a flexible and differentiable design, achieving state-of-the-art performance while significantly reducing computational cost.
Download Partiels – Exploring, Analyzing and Understanding Sounds This
article
presents
Partiels,
an
open-source
application
developed at IRCAM to analyze digital audio files and explore
sound characteristics.
The application uses Vamp plug-ins to
extract various information on different aspects of the sound, such
as spectrum, partials, pitch, tempo, text, and chords. Partiels is the
successor to AudioSculpt, offering a modern, flexible interface for
visualizing, editing, and exporting analysis results, addressing a
wide range of issues from musicological practice to sound creation
and signal processing research. The article describes Partiels’ key
features, including analysis organization, audio file management,
results visualization and editing, as well as data export and sharing
options, and its interoperability with other software such as Max
and Pure Data. In addition, it highlights the numerous analysis
plug-ins developed at IRCAM, based in particular on machine
learning models, as well as the IRCAM Vamp extension, which
overcomes certain limitations of the original Vamp format.
Download SCHAEFFER: A Dataset of Human-Annotated Sound Objects for Machine Learning Applications Machine learning for sound generation is rapidly expanding within
the computer music community. However, most datasets used to
train models are built from field recordings, foley sounds, instrumental notes, or commercial music. This presents a significant
limitation for composers working in acousmatic and electroacoustic music, who require datasets tailored to their creative processes.
To address this gap, we introduce the SCHAEFFER Dataset (Spectromorphological Corpus of Human-annotated Audio with Electroacoustic Features For Experimental Research), a curated collection of 1000 sound objects designed and annotated by composers and students of electroacoustic composition. The dataset,
distributed under Creative Commons licenses, features annotations
combining technical and poetic descriptions, alongside classifications based on pre-defined spectromorphological categories.
Download Auditory Discrimination of Early Reflections in Virtual Rooms This study investigates the perceptual sensitivity to early reflection changes across different spatial directions in a virtual
reality (VR) environment. Using an ABX discrimination paradigm, participants evaluated speech stimuli convolved with thirdorder Ambisonic room impulse responses under three position
reversal (Left–Right, Front–Back, and Floor–Ceiling) and three
reverberation conditions (RT60 = 1.0 s, 0.6 s, and 0.2 s). Binomial tests revealed that participants consistently detected early reflection differences in the Left–Right reversal, while discrimination performance in the other two directions remained at or near
chance. This result can be explained by the higher acuity and
lower localisation blur found for the human auditory system. A
two-way ANOVA confirmed a significant main effect of spatial
position (p = 0.00685, η² = 0.1605), with no significant effect of
reverberation or interaction. The analysis of the binaural room
impulse responses showed wave forms and Direct-ReverberantRatio differences in the Left–Right reversal position, aligning
with perceptual results. However, no definitive causal link between DRR variations and perceptual outcomes can yet be established.
Download Hyperbolic Embeddings for Order-Aware Classification of Audio Effect Chains Audio effects (AFXs) are essential tools in music production, frequently applied in chains to shape timbre and dynamics. The order of AFXs in a chain plays a crucial role in determining the final sound, particularly when non-linear (e.g., distortion) or timevariant (e.g., chorus) processors are involved. Despite its importance, most AFX-related studies have primarily focused on estimating effect types and their parameters from a wet signal. To
address this gap, we formulate AFX chain recognition as the task
of jointly estimating AFX types and their order from a wet signal.
We propose a neural-network-based method that embeds wet signals into a hyperbolic space and classifies their AFX chains. Hyperbolic space can represent tree-structured data more efficiently
than Euclidean space due to its exponential expansion property.
Since AFX chains can be represented as trees, with AFXs as nodes
and edges encoding effect order, hyperbolic space is well-suited
for modeling the exponentially growing and non-commutative nature of ordered AFX combinations, where changes in effect order can result in different final sounds. Experiments using guitar
sounds demonstrate that, with an appropriate curvature, the proposed method outperforms its Euclidean counterpart. Further analysis based on AFX type and chain length highlights the effectiveness of the proposed method in capturing AFX order.
Download Fast Differentiable Modal Simulation of Non-Linear Strings, Membranes, and Plates Modal methods for simulating vibrations of strings, membranes, and plates are widely used in acoustics and physically
informed audio synthesis. However, traditional implementations,
particularly for non-linear models like the von Kármán plate, are
computationally demanding and lack differentiability, limiting inverse modelling and real-time applications. We introduce a fast,
differentiable, GPU-accelerated modal framework built with the
JAX library, providing efficient simulations and enabling gradientbased inverse modelling.
Benchmarks show that our approach
significantly outperforms CPU and GPU-based implementations,
particularly for simulations with many modes. Inverse modelling
experiments demonstrate that our approach can recover physical
parameters, including tension, stiffness, and geometry, from both
synthetic and experimental data. Although fitting physical parameters is more sensitive to initialisation compared to methods that
fit abstract spectral parameters, it provides greater interpretability
and more compact parameterisation. The code is released as open
source to support future research and applications in differentiable
physical modelling and sound synthesis.
Download Digital Morphophone Environment. Computer Rendering of a Pioneering Sound Processing Device This paper introduces a digital reconstruction of the morphophone,
a complex magnetophonic device developed in the 1950s within
the laboratories of the GRM (Groupe de Recherches Musicales)
in Paris. The analysis, design, and implementation methodologies
underlying the Digital Morphophone Environment are discussed.
Based on a detailed review of historical sources and limited
documentation – including a small body of literature and, most
notably, archival images – the core operational principles of the
morphophone have been modeled within the MAX visual programming environment. The main goals of this work are, on the one
hand, to study and make accessible a now obsolete and unavailable
tool, and on the other, to provide the opportunity for new explorations in computer music and research.