Download TorchFX: A Modern Approach to Audio DSP with PyTorch and GPU Acceleration The increasing complexity and real-time processing demands of
audio signals require optimized algorithms that utilize the computational power of Graphics Processing Units (GPUs).
Existing Digital Signal Processing (DSP) libraries often do not provide
the necessary efficiency and flexibility, particularly for integrating
with Artificial Intelligence (AI) models. In response, we introduce TorchFX: a GPU-accelerated Python library for DSP, engineered to facilitate sophisticated audio signal processing. Built on
the PyTorch framework, TorchFX offers an Object-Oriented interface similar to torchaudio but enhances functionality with a novel
pipe operator for intuitive filter chaining. The library provides a
comprehensive suite of Finite Impulse Response (FIR) and Infinite Impulse Response (IIR) filters, with a focus on multichannel
audio, thereby facilitating the integration of DSP and AI-based
approaches.
Our benchmarking results demonstrate significant
efficiency gains over traditional libraries like SciPy, particularly
in multichannel contexts. While there are current limitations in
GPU compatibility, ongoing developments promise broader support and real-time processing capabilities. TorchFX aims to become a useful tool for the community, contributing to innovation
in GPU-accelerated DSP. TorchFX is publicly available on GitHub
at https://github.com/matteospanio/torchfx.
Download GPGPU Patterns for Serial and Parallel Audio Effects Modern commodity GPUs offer high numerical throughput per
unit of cost, but often sit idle during audio workstation tasks. Various researches in the field have shown that GPUs excel at tasks
such as Finite-Difference Time-Domain simulation and wavefield
synthesis. Concrete implementations of several such projects are
available for use.
Benchmarks and use cases generally concentrate on running
one project on a GPU. Running multiple such projects simultaneously is less common, and reduces throughput. In this work
we list some concerns when running multiple heterogeneous tasks
on the GPU. We apply optimization strategies detailed in developer documentation and commercial CUDA literature, and show
results through the lens of real-time audio tasks. We benchmark
the cases of (i) a homogeneous effect chain made of previously
separate effects, and (ii) a synthesizer with distinct, parallelizable
sound generators.
Download An Evaluation of Audio Feature Extraction Toolboxes Audio feature extraction underpins a massive proportion of audio processing, music information retrieval, audio effect design and audio synthesis. Design, analysis, synthesis and evaluation often rely on audio features, but there are a large and diverse range of feature extraction tools presented to the community. An evaluation of existing audio feature extraction libraries was undertaken. Ten libraries and toolboxes were evaluated with the Cranfield Model for evaluation of information retrieval systems, reviewing the coverage, effort, presentation and time lag of a system. Comparisons are undertaken of these tools and example use cases are presented as to when toolboxes are most suitable. This paper allows a software engineer or researcher to quickly and easily select a suitable audio feature extraction toolbox.
Download A Modeller-Simulator for Instrumental Playing of Virtual Musical Instruments This paper presents a musician-oriented modelling and simulation environment for designing physically modelled virtual instruments and interacting with them via a high performance haptic device. In particular, our system allows restoring the physical coupling between the user and the manipulated virtual instrument, a key factor for expressive playing of traditional acoustical instruments that is absent in the vast majority of computer-based musical systems. We first analyse the various uses of haptic devices in Computer Music, and introduce the various technologies involved in our system. We then present the modeller and simulation environments, and examples of musical virtual instruments created with this new environment.
Download A New Functional Framework for a Sound System for Realtime Flight Simulation We will show a new sound framework and concept for realistic flight simulation. Dealing with a highly complex network of mechanical systems that act as physical sound sources the main focus is on a fully modular and extensible/scalable design. The prototype we developed is part of a fully functional Full Flight Simulator for Pilot Training.
Download Performing Expressive Rhythms with BillaBoop Voice-Driven Drum Generator In a previous work we presented a system for transcribing spoken rhythms into a symbolic score. Thereafter, the system was extended to process the vocal stream in real-time in order to allow a musician to use it as a voice-driven drum generator. Extensions to this work are the following. First we achieved a study of the system classification accuracy based on typical onomatopoeia used in western beat boxing, with the perspective of building a general supervised model for immediate use. Also, we want the user to be able to generate expressive rhythms, beyond the symbolic drum representation. Thus we considered a class-specific mapping of continuous vocal stream descriptors with either effects or synthesis parameters of the drum generator. The extraction of the symbolic drum stream is implemented in the BillaBoop VST Core plug-in. The class-specific mapping and the sound synthesis are carried out in Plogue Bidule 1 framework. All these components are integrated into a low-latency application that allows its use for live performances.
Download Sonic Screwdrivers: Sound as a Sculptural Process This paper discusses a Fine Art approach to the processes of digital audio. The author puts forward some ideas for a re-defining of digital audio software to embrace a wider audience and to promote the manipulation of sound as a sculptural process removed from, yet still related to, the assumed musical tradition. The authors artworks are introduced, and the impact of current research upon these artworks and upon the authors teaching are discussed.
Download Analysis and Trans-synthesis of Acoustic Bowed-String Instrument Recordings: a Case Study using Bach Cello Suites In this paper, analysis and trans-synthesis of acoustic bowed string instrument recordings with new non-negative matrix factorization (NMF) procedure are presented. This work shows that it may require more than one template to represent a note according to time-varying behavior of timbre, especially played by bowed string instruments. The proposed method improves original NMF without the knowledge of tone models and the number of required templates in advance. Resultant NMF information is then converted into the synthesis parameters of the sinusoidal synthesis. Bach cello suites recorded by Fournier and Starker are used in the experiments. Analysis and trans-synthesis examples of the recordings are also provided. Index Terms—trans-synthesis, non-negative matrix factorization, bowed string instrument
Download Identification of Time-frequency Maps for sounds timbre discrimination Gabor Multipliers are signals operator which are diagonal in a time-frequency representation of signals and can be viewed as timefrequency transfer function. If we estimate a Gabor mask between a note played by two instruments, then we have a time-frequency representation of the difference of timbre between these two notes. By averaging the energy contained in the Gabor mask, we obtain a measure of this difference. In this context, our goal is to automatically localize the time-frequency regions responsible for such a timbre dissimilarity. This problem is addressed as a feature selection problem over the time-frequency coefficients of a labelled data set of sounds.
Download Simulating Microphone Bleed and Tom-tom Resonance in Multisampled Drum Workstations In recent years multisampled drum workstations have become increasingly popular. They offer an alternative to recording a full drum kit if a producer, engineer or amateur lacks the equipment, money, space or knowledge to produce a quality recording. These drum workstations strive for realism, often recording up to a hundred different velocity hits of the same drum, including recordings from all microphones for each drum hit and including bleed between these microphones. This paper describes research undertaken to investigate if it is possible to simulate the snare and kick drum bleed into the tom-tom microphones and the subsequent resonance of the tom-tom that is caused, with the aim of reducing the amount of audio data that needs to be stored. A listening test was performed asking participants to identify the real recording from a simulation. The results were not statistically significant to reject the hypothesis that subjects were unable to distinguish the difference between the real and simulated recordings. This suggests listeners were unable to identify the real recording in the majority of cases.