Download GRAFX: An Open-Source Library for Audio Processing Graphs in Pytorch
We present GRAFX, an open-source library designed for handling audio processing graphs in PyTorch. Along with various library functionalities, we describe technical details on the efficient parallel computation of input graphs, signals, and processor parameters in GPU. Then, we show its example use under a music mixing scenario, where parameters of every differentiable processor in a large graph are optimized via gradient descent. The code is available at https://github.com/sh-lee97/grafx.
Download Towards Efficient Emulation of Nonlinear Analog Circuits for Audio Using Constraint Stabilization and Convex Quadratic Programming
This paper introduces a computationally efficient method for the emulation of nonlinear analog audio circuits by combining state-space representations, constraint stabilization, and convex quadratic programming (QP). Unlike traditional virtual analog (VA) modeling approaches or computationally demanding SPICE-based simulations, our approach reformulates the nonlinear differential-algebraic (DAE) systems that arise from analog circuit analysis into numerically stable optimization problems. The proposed method efficiently addresses the numerical challenges posed by nonlinear algebraic constraints via constraint stabilization techniques, significantly enhancing robustness and stability, suitable for real-time simulations. A canonical diode clipper circuit is presented as a test case, demonstrating that our method achieves accurate and faster emulations compared to conventional state-space methods. Furthermore, our method performs very well even at substantially lower sampling rates. Preliminary numerical experiments confirm that the proposed approach offers improved numerical stability and real-time feasibility, positioning it as a practical solution for high-fidelity audio applications.
Download TorchFX: A Modern Approach to Audio DSP with PyTorch and GPU Acceleration
The increasing complexity and real-time processing demands of audio signals require optimized algorithms that utilize the computational power of Graphics Processing Units (GPUs). Existing Digital Signal Processing (DSP) libraries often do not provide the necessary efficiency and flexibility, particularly for integrating with Artificial Intelligence (AI) models. In response, we introduce TorchFX: a GPU-accelerated Python library for DSP, engineered to facilitate sophisticated audio signal processing. Built on the PyTorch framework, TorchFX offers an Object-Oriented interface similar to torchaudio but enhances functionality with a novel pipe operator for intuitive filter chaining. The library provides a comprehensive suite of Finite Impulse Response (FIR) and Infinite Impulse Response (IIR) filters, with a focus on multichannel audio, thereby facilitating the integration of DSP and AI-based approaches. Our benchmarking results demonstrate significant efficiency gains over traditional libraries like SciPy, particularly in multichannel contexts. While there are current limitations in GPU compatibility, ongoing developments promise broader support and real-time processing capabilities. TorchFX aims to become a useful tool for the community, contributing to innovation in GPU-accelerated DSP. TorchFX is publicly available on GitHub at https://github.com/matteospanio/torchfx.
Download SCHAEFFER: A Dataset of Human-Annotated Sound Objects for Machine Learning Applications
Machine learning for sound generation is rapidly expanding within the computer music community. However, most datasets used to train models are built from field recordings, foley sounds, instrumental notes, or commercial music. This presents a significant limitation for composers working in acousmatic and electroacoustic music, who require datasets tailored to their creative processes. To address this gap, we introduce the SCHAEFFER Dataset (Spectromorphological Corpus of Human-annotated Audio with Electroacoustic Features For Experimental Research), a curated collection of 1000 sound objects designed and annotated by composers and students of electroacoustic composition. The dataset, distributed under Creative Commons licenses, features annotations combining technical and poetic descriptions, alongside classifications based on pre-defined spectromorphological categories.
Download A Complex Envelope Sinusoidal Model for Audio Coding
A modification to the hybrid sinusoidal model is proposed for the purpose of high-quality audio coding. In our proposal the amplitude envelope of each harmonic partial is modeled by a narrowband complex signal. Such representation incorporates most of the signal energy associated with sinusoidal components, including that related to frequency estimation and quantization errors. It also takes into account the natural width of each spectral line. The advantages of such model extension are a more straightforward and robust representation of the deterministic component and a clean stochastic residual without ghost sinusoids. The reconstructed signal is virtually free from harmonic artifacts and more natural sounding. We propose to encode the complex envelopes by the means of MCLT transform coefficients with coefficient interleave across partials within an MPEG-like coding scheme. We show some experimental results with high compression efficiency achieved.
Download Sound texture synthesis using Convolutional Neural Networks
The following article introduces a new parametric synthesis algorithm for sound textures inspired by existing methods used for visual textures. Using a 2D Convolutional Neural Network (CNN), a sound signal is modified until the temporal cross-correlations of the feature maps of its log-spectrogram resemble those of a target texture. We show that the resulting synthesized sound signal is both different from the original and of high quality, while being able to reproduce singular events appearing in the original. This process is performed in the time domain, discarding the harmful phase recovery step which usually concludes synthesis performed in the time-frequency domain. It is also straightforward and flexible, as it does not require any fine tuning between several losses when synthesizing diverse sound textures. Synthesized spectrograms and sound signals are showcased, and a way of extending the synthesis in order to produce a sound of any length is also presented. We also discuss the choice of CNN, border effects in our synthesized signals and possible ways of modifying the algorithm in order to improve its current long computation time.
Download Musical Aspects of Vowel Formants in the Extreme Metal Voice
Download Information Retrieval of Marovany Zither Music Based on an Original Optical-Based System
In this work, we introduced an original optical-based retrieval system dedicated to the music analysis of the marovany zither, a traditional instrument of Madagascar. From a humanistic perspective, our motivation for studying this particular instrument is its cultural importance due to its association with a possession ritual called tromba. The long-term goal of this work is to achieve a systematic classification of the marovany musical repertoire in this context of trance, and to classify the different recurrent musical patterns according to identifiable information. From an engineering perspective, we worked on the problem of competing signals in audio field recordings, e.g., from audience participation or percussion instruments. To overcome this problem, we recommended the use of a multichannel optical recording, putting forward technological qualities such as acquisition of independent signals corresponding to each string, high signal to noise ratio (high sensitivity to string displacement / low sensitivity to external sources), systematic inter-notes demarcation resulting from the finger-string contact. Optical signal characteristics greatly simplify the delicate task of automatic music transcription, especially when facing polyphonic music in noisy environment.
Download Hierarchical Organization and Visualization of Drum Sample Libraries
Drum samples are an important ingredient for many styles of music. Large libraries of drum sounds are readily available. However, their value is limited by the ways in which users can explore them to retrieve sounds. Available organization schemes rely on cumbersome manual classification. In this paper, we present a new approach for automatically structuring and visualizing large sample libraries through audio signal analysis. In particular, we present a hierarchical user interface for efficient exploration and retrieval based on a computational model of similarity and self-organizing maps.
Download Examining the Oscillator Waveform Animation Effect
An enhancing effect that can be applied to analogue oscillators in subtractive synthesizers is termed Animation, which is an efficient way to create a sound of many closely detuned oscillators playing in unison. This is often referred to as a supersaw oscillator. This paper first explains the operating principle of this effect using a combination of additive and frequency modulation synthesis. The Fourier series will be derived and results will be presented to demonstrate its accuracy. This will then provide new insights into how other more general waveform animation processors can be designed.