Download Partiels – Exploring, Analyzing and Understanding Sounds This
article
presents
Partiels,
an
open-source
application
developed at IRCAM to analyze digital audio files and explore
sound characteristics.
The application uses Vamp plug-ins to
extract various information on different aspects of the sound, such
as spectrum, partials, pitch, tempo, text, and chords. Partiels is the
successor to AudioSculpt, offering a modern, flexible interface for
visualizing, editing, and exporting analysis results, addressing a
wide range of issues from musicological practice to sound creation
and signal processing research. The article describes Partiels’ key
features, including analysis organization, audio file management,
results visualization and editing, as well as data export and sharing
options, and its interoperability with other software such as Max
and Pure Data. In addition, it highlights the numerous analysis
plug-ins developed at IRCAM, based in particular on machine
learning models, as well as the IRCAM Vamp extension, which
overcomes certain limitations of the original Vamp format.
Download Towards Neural Emulation of Voltage-Controlled Oscillators Machine learning models have become ubiquitous in modeling
analog audio devices. Expanding on this line of research, our study
focuses on Voltage-Controlled Oscillators of analog synthesizers.
We employ black box autoregressive artificial neural networks to
model the typical analog waveshapes, including triangle, square,
and sawtooth. The models can be conditioned on wave frequency
and type, enabling the generation of pitch envelopes and morphing across waveshapes. We conduct evaluations on both synthetic
and analog datasets to assess the accuracy of various architectural
variants. The LSTM variant performed better, although lower frequency ranges present particular challenges.
Download Lookup Table Based Audio Spectral Transformation We present a unified visual interface for flexible spectral audio manipulation based on editable lookup tables (LUTs). In the proposed
approach, the audio spectrum is visualized as a two-dimensional
color map of frequency versus amplitude, serving as an editable
lookup table for modifying the sound. This single tool can replicate common audio effects such as equalization, pitch shifting, and
spectral compression, while also enabling novel sound transformations through creative combinations of adjustments. By consolidating these capabilities into one visual platform, the system has
the potential to streamline audio-editing workflows and encourage
creative experimentation. The approach also supports real-time
processing, providing immediate auditory feedback in an interactive graphical environment. Overall, this LUT-based method offers
an accessible yet powerful framework for designing and applying
a broad range of spectral audio effects through intuitive visual manipulation.
Download Differentiable Attenuation Filters for Feedback Delay Networks We introduce a novel method for designing attenuation filters in
digital audio reverberation systems based on Feedback Delay Networks (FDNs). Our approach uses Second Order Sections (SOS)
of Infinite Impulse Response (IIR) filters arranged as parametric
equalizers (PEQ), enabling fine control over frequency-dependent
reverberation decay. Unlike traditional graphic equalizer designs,
which require numerous filters per delay line, we propose a scalable solution where the number of filters can be adjusted. The frequency, gain, and quality factor (Q) parameters are shared parameters across delay lines and only the gain is adjusted based on delay
length. This design not only reduces the number of optimization
parameters, but also remains fully differentiable and compatible
with gradient-based learning frameworks. Leveraging principles
of analog filter design, our method allows for efficient and accurate filter fitting using supervised learning. Our method delivers
a flexible and differentiable design, achieving state-of-the-art performance while significantly reducing computational cost.
Download Towards Efficient Emulation of Nonlinear Analog Circuits for Audio Using Constraint Stabilization and Convex Quadratic Programming This paper introduces a computationally efficient method for
the emulation of nonlinear analog audio circuits by combining state-space representations, constraint stabilization, and convex quadratic programming (QP). Unlike traditional virtual analog (VA) modeling approaches or computationally demanding
SPICE-based simulations, our approach reformulates the nonlinear
differential-algebraic (DAE) systems that arise from analog circuit
analysis into numerically stable optimization problems. The proposed method efficiently addresses the numerical challenges posed
by nonlinear algebraic constraints via constraint stabilization techniques, significantly enhancing robustness and stability, suitable
for real-time simulations. A canonical diode clipper circuit is presented as a test case, demonstrating that our method achieves accurate and faster emulations compared to conventional state-space
methods. Furthermore, our method performs very well even at
substantially lower sampling rates. Preliminary numerical experiments confirm that the proposed approach offers improved numerical stability and real-time feasibility, positioning it as a practical
solution for high-fidelity audio applications.
Download TorchFX: A Modern Approach to Audio DSP with PyTorch and GPU Acceleration The increasing complexity and real-time processing demands of
audio signals require optimized algorithms that utilize the computational power of Graphics Processing Units (GPUs).
Existing Digital Signal Processing (DSP) libraries often do not provide
the necessary efficiency and flexibility, particularly for integrating
with Artificial Intelligence (AI) models. In response, we introduce TorchFX: a GPU-accelerated Python library for DSP, engineered to facilitate sophisticated audio signal processing. Built on
the PyTorch framework, TorchFX offers an Object-Oriented interface similar to torchaudio but enhances functionality with a novel
pipe operator for intuitive filter chaining. The library provides a
comprehensive suite of Finite Impulse Response (FIR) and Infinite Impulse Response (IIR) filters, with a focus on multichannel
audio, thereby facilitating the integration of DSP and AI-based
approaches.
Our benchmarking results demonstrate significant
efficiency gains over traditional libraries like SciPy, particularly
in multichannel contexts. While there are current limitations in
GPU compatibility, ongoing developments promise broader support and real-time processing capabilities. TorchFX aims to become a useful tool for the community, contributing to innovation
in GPU-accelerated DSP. TorchFX is publicly available on GitHub
at https://github.com/matteospanio/torchfx.
Download SCHAEFFER: A Dataset of Human-Annotated Sound Objects for Machine Learning Applications Machine learning for sound generation is rapidly expanding within
the computer music community. However, most datasets used to
train models are built from field recordings, foley sounds, instrumental notes, or commercial music. This presents a significant
limitation for composers working in acousmatic and electroacoustic music, who require datasets tailored to their creative processes.
To address this gap, we introduce the SCHAEFFER Dataset (Spectromorphological Corpus of Human-annotated Audio with Electroacoustic Features For Experimental Research), a curated collection of 1000 sound objects designed and annotated by composers and students of electroacoustic composition. The dataset,
distributed under Creative Commons licenses, features annotations
combining technical and poetic descriptions, alongside classifications based on pre-defined spectromorphological categories.
Download Fast Differentiable Modal Simulation of Non-Linear Strings, Membranes, and Plates Modal methods for simulating vibrations of strings, membranes, and plates are widely used in acoustics and physically
informed audio synthesis. However, traditional implementations,
particularly for non-linear models like the von Kármán plate, are
computationally demanding and lack differentiability, limiting inverse modelling and real-time applications. We introduce a fast,
differentiable, GPU-accelerated modal framework built with the
JAX library, providing efficient simulations and enabling gradientbased inverse modelling.
Benchmarks show that our approach
significantly outperforms CPU and GPU-based implementations,
particularly for simulations with many modes. Inverse modelling
experiments demonstrate that our approach can recover physical
parameters, including tension, stiffness, and geometry, from both
synthetic and experimental data. Although fitting physical parameters is more sensitive to initialisation compared to methods that
fit abstract spectral parameters, it provides greater interpretability
and more compact parameterisation. The code is released as open
source to support future research and applications in differentiable
physical modelling and sound synthesis.
Download Improving Lyrics-to-Audio Alignment Using Frame-wise Phoneme Labels with Masked Cross Entropy Loss This paper addresses the task of lyrics-to-audio alignment, which
involves synchronizing textual lyrics with corresponding music
audio. Most publicly available datasets for this task provide annotations only at the line or word level. This poses a challenge
for training lyrics-to-audio models due to the lack of frame-wise
phoneme labels. However, we find that phoneme labels can be
partially derived from word-level annotations: for single-phoneme
words, all frames corresponding to the word can be labeled with
the same phoneme; for multi-phoneme words, phoneme labels can
be assigned at the first and last frames of the word. To leverage
this partial information, we construct a mask for those frames and
propose a masked frame-wise cross-entropy (CE) loss that considers only frames with known phoneme labels. As a baseline model,
we adopt an autoencoder trained with a Connectionist Temporal
Classification (CTC) loss and a reconstruction loss. We then enhance the training process by incorporating the proposed framewise masked CE loss. Experimental results show that incorporating the frame-wise masked CE loss improves alignment performance. In comparison to other state-of-the art models, our model
provides a comparable Mean Absolute Error (MAE) of 0.216 seconds and a top Median Absolute Error (MedAE) of 0.041 seconds
on the testing Jamendo dataset.
Download Differentiable Scattering Delay Networks for Artificial Reverberation Scattering delay networks (SDNs) provide a flexible and efficient
framework for artificial reverberation and room acoustic modeling. In this work, we introduce a differentiable SDN, enabling
gradient-based optimization of its parameters to better approximate the acoustics of real-world environments. By formulating
key parameters such as scattering matrices and absorption filters
as differentiable functions, we employ gradient descent to optimize an SDN based on a target room impulse response. Our approach minimizes discrepancies in perceptually relevant acoustic
features, such as energy decay and frequency-dependent reverberation times. Experimental results demonstrate that the learned SDN
configurations significantly improve the accuracy of synthetic reverberation, highlighting the potential of data-driven room acoustic modeling.