Download Fluently Remixing Musical Objects with Higher-Order Functions Soon after the Echo Nest Remix API was made publicly available and open source, the primary author began aggressively enhancing the Python framework for re-editing music based on perceptually-based musical analyses. The basic principles of this API – integrating content-based metadata with the underlying signal – are described in the paper, then the authors’ enhancements are described. The libraries moved from supporting an imperative coding style to incorporating influences from functional programming and domain specific languages to allow for a much more fluent, terse coding style, allowing users to concentrate on the functions needed to find the portions of the song that were interesting, and modifying them. The paper then goes on to describe enhancements involving mixing multiple sources with one another and enabling user-created and user-modifiable effects that are controlled by direct manipulation of the objects that represent the sound. Revelations that the Remix API does not need to be as integrated as it currently is point to future directions for the API at the end of the paper.
Download The Restoration of Single Channel Audio Recordings Based on Non-Negative Matrix Factorization and Perceptual Suppression Rule In this paper, we focus on the signal-to-noise ratio (SNR) improvement in single channel audio recordings. Many approaches have been reported in the literature. The most popular method, with many variants, is Short Time Spectral Attenuation (STSA). Although this method reduces the noise and improves the SNR, it mostly tends to introduce signal distortion and a perceptually annoying residual noise usually called musical noise. In this paper we investigate the use of Non-negative Matrix Factorization (NMF) as an alternative to the STSA for the digital curation of musical heritage. NMF is an emerging new technique in the blind extraction of signals recorded in a variety of different fields. The application of NMF to the analysis of monaural recordings is relatively recent. We show that NMF is a suitable technique to extract the clean audio signal from undesired non stationary noise in a monaural recording of ethnic music. More specifically, we introduce a perceptual suppression rule to determine how the perceptual domain is competitive compared to the acoustic domain. Moreover, we carry out a listening test in order to compare NMF with the state of the art audio restoration framework using the EBU MUSHRA test method. The encouraging results obtained with this methodology in the presented case study support their wider applicability in audio separation.
Download A Diffusion-Based Generative Equalizer for Music Restoration This paper presents a novel approach to audio restoration, focusing on the enhancement of low-quality music recordings, and in particular historical ones. Building upon a previous algorithm called BABE, or Blind Audio Bandwidth Extension, we introduce BABE-2, which presents a series of improvements. This research broadens the concept of bandwidth extension to generative equalization, a task that, to the best of our knowledge, has not been previously addressed for music restoration. BABE-2 is built around an optimization algorithm utilizing priors from diffusion models, which are trained or fine-tuned using a curated set of high-quality music tracks. The algorithm simultaneously performs two critical tasks: estimation of the filter degradation magnitude response and hallucination of the restored audio. The proposed method is objectively evaluated on historical piano recordings, showing an enhancement over the prior version. The method yields similarly impressive results in rejuvenating the works of renowned vocalists Enrico Caruso and Nellie Melba. This research represents an advancement in the practical restoration of historical music. Historical music restoration examples are available at: research.spa.aalto.fi/publications/papers/dafx-babe2/.
Download Digitally Moving An Electric Guitar Pickup This paper describes a technique to transform the sound of an arbitrarily selected magnetic pickup into another pickup selection on the same electric guitar. This is a first step towards replicating an arbitrary electric guitar timbre in an audio recording using the signal from another guitar as input. We record 1458 individual notes from the pickups of a single guitar, varying the string, fret, plucking position, and dynamics of the tones in order to create a controlled dataset for training and testing our approach. Given an input signal and a target signal, a least squares estimator is used to obtain the coefficients of a finite impulse response (FIR) filter to match the desired magnetic pickup position. We use spectral difference to measure the error of the emulation, and test the effects of independent variables fret, dynamics, plucking position and repetition on the accuracy. A small reduction in accuracy was observed for different repetitions; moderate errors arose when the playing style (plucking position and dynamics) were varied; and there were large differences between output and target when the training and test data comprised different notes (fret positions). We explain results in terms of the acoustics of the vibrating strings.
Download Implementing Real-Time Partitioned Convolution Algorithms on Conventional Operating Systems We describe techniques for implementing real-time partitioned convolution algorithms on conventional operating systems using two different scheduling paradigms: time-distributed (cooperative) and multi-threaded (preemptive). We discuss the optimizations applied to both implementations and present measurements of their performance for a range of impulse response lengths on a recent high-end desktop machine. We find that while the time-distributed implementation is better suited for use as a plugin within a host audio application, the preemptive version was easier to implement and significantly outperforms the time-distributed version despite the overhead of frequent context switches.
Download Expressive Controllers For Bowed String Physical Models In this paper we propose different approaches to control a real-time physical model of a bowed string instrument. Starting from a commercially available device, we show how to improve the gestural control of the model.
Download Comparing synthetic and real templates for dynamic time warping to locate partial envelope features In this paper we compare the performance of a number of different templates for the purposes of split point identification of various clarinet envelopes. These templates were generated with AttackDecay-Sustain-Release (ADSR) descriptions commonly used in musical synthesis, along with a set of real templates obtained using k-means clustering of manually prepared test data. The goodness of fit of the templates to the data was evaluated using the Dynamic Time Warping (DTW) cost function, and by evaluating the square of the distance of the identified split points to the manually identified split points in the test data. It was found that the best templates for split point identification were the synthetic templates followed by the real templates having a sharp attack and release characteristic, as is characteristic of the clarinet envelope.
Download Feature Based Delay Line Using Real-Time Concatenative Synthesis In this paper we introduce a novel approach utilizing real-time concatenative synthesis to produce a Feature-Based Delay Line (FBDL). Expanding upon the concept of a traditional delay, its most basic function is familiar – a dry signal is copied to an audio buffer whose read position is time shifted producing a delayed or "wet" signal that is then remixed with the dry. In our implementation, however, the traditionally unaltered wet signal is modified such that the audio delay buffer is segmented and concatenated according to specific audio features. Specifically, the input audio is analyzed and segmented as it is written to the delay buffer, where delayed segments are matched to a target feature set, such that the most similar segments are selected to constitute the wet signal of the delay. Targeting methods, either manual or automated, can be used to explore the feature space of the delay line buffer based on dry signal feature information and relevant targeting parameters, such as delay time. This paper will outline our process, detailing important requirements such as targeting and considerations for feature extraction and concatenation synthesis, as well as discussing use cases, performance evaluation, and commentary on the potential of advances to digital delay lines.
Download Extended Source-Filter Model for Harmonic Instruments for Expressive Control of Sound Synthesis and Transformation In this paper we present a revised and improved version of a recently proposed extended source-filter model for sound synthesis, transformation and hybridization of harmonic instruments. This extension focuses mainly on the application for impulsively excited instruments like piano or guitar, but also improves synthesis results for continuously driven instruments including their hybrids. This technique comprises an extensive analysis of an instruments sound database, followed by the estimation of a generalized instrument model reflecting timbre variations according to selected control parameters. Such an instrument model allows for natural sounding transformations and expressive control of instrument sounds regarding its control parameters.
Download TorchFX: A Modern Approach to Audio DSP with PyTorch and GPU Acceleration The increasing complexity and real-time processing demands of
audio signals require optimized algorithms that utilize the computational power of Graphics Processing Units (GPUs).
Existing Digital Signal Processing (DSP) libraries often do not provide
the necessary efficiency and flexibility, particularly for integrating
with Artificial Intelligence (AI) models. In response, we introduce TorchFX: a GPU-accelerated Python library for DSP, engineered to facilitate sophisticated audio signal processing. Built on
the PyTorch framework, TorchFX offers an Object-Oriented interface similar to torchaudio but enhances functionality with a novel
pipe operator for intuitive filter chaining. The library provides a
comprehensive suite of Finite Impulse Response (FIR) and Infinite Impulse Response (IIR) filters, with a focus on multichannel
audio, thereby facilitating the integration of DSP and AI-based
approaches.
Our benchmarking results demonstrate significant
efficiency gains over traditional libraries like SciPy, particularly
in multichannel contexts. While there are current limitations in
GPU compatibility, ongoing developments promise broader support and real-time processing capabilities. TorchFX aims to become a useful tool for the community, contributing to innovation
in GPU-accelerated DSP. TorchFX is publicly available on GitHub
at https://github.com/matteospanio/torchfx.