Download Partiels – Exploring, Analyzing and Understanding Sounds
This article presents Partiels, an open-source application developed at IRCAM to analyze digital audio files and explore sound characteristics. The application uses Vamp plug-ins to extract various information on different aspects of the sound, such as spectrum, partials, pitch, tempo, text, and chords. Partiels is the successor to AudioSculpt, offering a modern, flexible interface for visualizing, editing, and exporting analysis results, addressing a wide range of issues from musicological practice to sound creation and signal processing research. The article describes Partiels’ key features, including analysis organization, audio file management, results visualization and editing, as well as data export and sharing options, and its interoperability with other software such as Max and Pure Data. In addition, it highlights the numerous analysis plug-ins developed at IRCAM, based in particular on machine learning models, as well as the IRCAM Vamp extension, which overcomes certain limitations of the original Vamp format.
Download Towards Neural Emulation of Voltage-Controlled Oscillators
Machine learning models have become ubiquitous in modeling analog audio devices. Expanding on this line of research, our study focuses on Voltage-Controlled Oscillators of analog synthesizers. We employ black box autoregressive artificial neural networks to model the typical analog waveshapes, including triangle, square, and sawtooth. The models can be conditioned on wave frequency and type, enabling the generation of pitch envelopes and morphing across waveshapes. We conduct evaluations on both synthetic and analog datasets to assess the accuracy of various architectural variants. The LSTM variant performed better, although lower frequency ranges present particular challenges.
Download Lookup Table Based Audio Spectral Transformation
We present a unified visual interface for flexible spectral audio manipulation based on editable lookup tables (LUTs). In the proposed approach, the audio spectrum is visualized as a two-dimensional color map of frequency versus amplitude, serving as an editable lookup table for modifying the sound. This single tool can replicate common audio effects such as equalization, pitch shifting, and spectral compression, while also enabling novel sound transformations through creative combinations of adjustments. By consolidating these capabilities into one visual platform, the system has the potential to streamline audio-editing workflows and encourage creative experimentation. The approach also supports real-time processing, providing immediate auditory feedback in an interactive graphical environment. Overall, this LUT-based method offers an accessible yet powerful framework for designing and applying a broad range of spectral audio effects through intuitive visual manipulation.
Download Differentiable Attenuation Filters for Feedback Delay Networks
We introduce a novel method for designing attenuation filters in digital audio reverberation systems based on Feedback Delay Networks (FDNs). Our approach uses Second Order Sections (SOS) of Infinite Impulse Response (IIR) filters arranged as parametric equalizers (PEQ), enabling fine control over frequency-dependent reverberation decay. Unlike traditional graphic equalizer designs, which require numerous filters per delay line, we propose a scalable solution where the number of filters can be adjusted. The frequency, gain, and quality factor (Q) parameters are shared parameters across delay lines and only the gain is adjusted based on delay length. This design not only reduces the number of optimization parameters, but also remains fully differentiable and compatible with gradient-based learning frameworks. Leveraging principles of analog filter design, our method allows for efficient and accurate filter fitting using supervised learning. Our method delivers a flexible and differentiable design, achieving state-of-the-art performance while significantly reducing computational cost.
Download Towards Efficient Emulation of Nonlinear Analog Circuits for Audio Using Constraint Stabilization and Convex Quadratic Programming
This paper introduces a computationally efficient method for the emulation of nonlinear analog audio circuits by combining state-space representations, constraint stabilization, and convex quadratic programming (QP). Unlike traditional virtual analog (VA) modeling approaches or computationally demanding SPICE-based simulations, our approach reformulates the nonlinear differential-algebraic (DAE) systems that arise from analog circuit analysis into numerically stable optimization problems. The proposed method efficiently addresses the numerical challenges posed by nonlinear algebraic constraints via constraint stabilization techniques, significantly enhancing robustness and stability, suitable for real-time simulations. A canonical diode clipper circuit is presented as a test case, demonstrating that our method achieves accurate and faster emulations compared to conventional state-space methods. Furthermore, our method performs very well even at substantially lower sampling rates. Preliminary numerical experiments confirm that the proposed approach offers improved numerical stability and real-time feasibility, positioning it as a practical solution for high-fidelity audio applications.
Download TorchFX: A Modern Approach to Audio DSP with PyTorch and GPU Acceleration
The increasing complexity and real-time processing demands of audio signals require optimized algorithms that utilize the computational power of Graphics Processing Units (GPUs). Existing Digital Signal Processing (DSP) libraries often do not provide the necessary efficiency and flexibility, particularly for integrating with Artificial Intelligence (AI) models. In response, we introduce TorchFX: a GPU-accelerated Python library for DSP, engineered to facilitate sophisticated audio signal processing. Built on the PyTorch framework, TorchFX offers an Object-Oriented interface similar to torchaudio but enhances functionality with a novel pipe operator for intuitive filter chaining. The library provides a comprehensive suite of Finite Impulse Response (FIR) and Infinite Impulse Response (IIR) filters, with a focus on multichannel audio, thereby facilitating the integration of DSP and AI-based approaches. Our benchmarking results demonstrate significant efficiency gains over traditional libraries like SciPy, particularly in multichannel contexts. While there are current limitations in GPU compatibility, ongoing developments promise broader support and real-time processing capabilities. TorchFX aims to become a useful tool for the community, contributing to innovation in GPU-accelerated DSP. TorchFX is publicly available on GitHub at https://github.com/matteospanio/torchfx.
Download SCHAEFFER: A Dataset of Human-Annotated Sound Objects for Machine Learning Applications
Machine learning for sound generation is rapidly expanding within the computer music community. However, most datasets used to train models are built from field recordings, foley sounds, instrumental notes, or commercial music. This presents a significant limitation for composers working in acousmatic and electroacoustic music, who require datasets tailored to their creative processes. To address this gap, we introduce the SCHAEFFER Dataset (Spectromorphological Corpus of Human-annotated Audio with Electroacoustic Features For Experimental Research), a curated collection of 1000 sound objects designed and annotated by composers and students of electroacoustic composition. The dataset, distributed under Creative Commons licenses, features annotations combining technical and poetic descriptions, alongside classifications based on pre-defined spectromorphological categories.
Download Fast Differentiable Modal Simulation of Non-Linear Strings, Membranes, and Plates
Modal methods for simulating vibrations of strings, membranes, and plates are widely used in acoustics and physically informed audio synthesis. However, traditional implementations, particularly for non-linear models like the von Kármán plate, are computationally demanding and lack differentiability, limiting inverse modelling and real-time applications. We introduce a fast, differentiable, GPU-accelerated modal framework built with the JAX library, providing efficient simulations and enabling gradientbased inverse modelling. Benchmarks show that our approach significantly outperforms CPU and GPU-based implementations, particularly for simulations with many modes. Inverse modelling experiments demonstrate that our approach can recover physical parameters, including tension, stiffness, and geometry, from both synthetic and experimental data. Although fitting physical parameters is more sensitive to initialisation compared to methods that fit abstract spectral parameters, it provides greater interpretability and more compact parameterisation. The code is released as open source to support future research and applications in differentiable physical modelling and sound synthesis.
Download Improving Lyrics-to-Audio Alignment Using Frame-wise Phoneme Labels with Masked Cross Entropy Loss
This paper addresses the task of lyrics-to-audio alignment, which involves synchronizing textual lyrics with corresponding music audio. Most publicly available datasets for this task provide annotations only at the line or word level. This poses a challenge for training lyrics-to-audio models due to the lack of frame-wise phoneme labels. However, we find that phoneme labels can be partially derived from word-level annotations: for single-phoneme words, all frames corresponding to the word can be labeled with the same phoneme; for multi-phoneme words, phoneme labels can be assigned at the first and last frames of the word. To leverage this partial information, we construct a mask for those frames and propose a masked frame-wise cross-entropy (CE) loss that considers only frames with known phoneme labels. As a baseline model, we adopt an autoencoder trained with a Connectionist Temporal Classification (CTC) loss and a reconstruction loss. We then enhance the training process by incorporating the proposed framewise masked CE loss. Experimental results show that incorporating the frame-wise masked CE loss improves alignment performance. In comparison to other state-of-the art models, our model provides a comparable Mean Absolute Error (MAE) of 0.216 seconds and a top Median Absolute Error (MedAE) of 0.041 seconds on the testing Jamendo dataset.
Download Differentiable Scattering Delay Networks for Artificial Reverberation
Scattering delay networks (SDNs) provide a flexible and efficient framework for artificial reverberation and room acoustic modeling. In this work, we introduce a differentiable SDN, enabling gradient-based optimization of its parameters to better approximate the acoustics of real-world environments. By formulating key parameters such as scattering matrices and absorption filters as differentiable functions, we employ gradient descent to optimize an SDN based on a target room impulse response. Our approach minimizes discrepancies in perceptually relevant acoustic features, such as energy decay and frequency-dependent reverberation times. Experimental results demonstrate that the learned SDN configurations significantly improve the accuracy of synthetic reverberation, highlighting the potential of data-driven room acoustic modeling.