Download Hybrid Audio Inpainting Approach with Structured Sparse Decomposition and Sinusoidal Modeling
This research presents a novel hybrid audio inpainting approach that considers the diversity of signals and enhances the reconstruction quality. Existing inpainting approaches have limitations, such as energy drop and poor reconstruction quality for non-stationary signals. Based on the fact that an audio signal can be considered as a mixture of three components: tonal, transients, and noise, the proposed approach divides the left and right reliable neighborhoods around the gap into these components using a structured sparse decomposition technique. The gap is reconstructed by extrapolating parameters estimated from the reliable neighborhoods of each component. Component-targeted methods are refined and employed to extrapolate the parameters based on their own acoustic characteristics. Experiments were conducted to evaluate the performance of the hybrid approach and compare it with other stateof-the-art inpainting approaches. The results show the hybrid approach achieves high-quality reconstruction and low computational complexity across various gap lengths and signal types, particularly for longer gaps and non-stationary signals.
Download NBU: Neural Binaural Upmixing of Stereo Content
While immersive music productions have become popular in recent years, music content produced during the last decades has been predominantly mixed for stereo. This paper presents a datadriven approach to automatic binaural upmixing of stereo music. The network architecture HDemucs, previously utilized for both source separation and binauralization, is leveraged for an endto-end approach to binaural upmixing. We employ two distinct datasets, demonstrating that while custom-designed training data enhances the accuracy of spatial positioning, the use of professionally mixed music yields superior spatialization. The trained networks show a capacity to process multiple simultaneous sources individually and add valid binaural cues, effectively positioning sources with an average azimuthal error of less than 11.3 ◦ . A listening test with binaural experts shows it outperforms digital signal processing-based approaches to binauralization of stereo content in terms of spaciousness while preserving audio quality.
Download LTFATPY: Towards Making a Wide Range of Time-Frequency Representations Available in Python
LTFATPY is a software package for accessing the Large Time Frequency Analysis Toolbox (LTFAT) from Python. Dedicated to time-frequency analysis, LTFAT comprises a large number of linear transforms for Fourier, Gabor, and wavelet analysis along with their associated operators. Its filter bank module is a collection of computational routines for finite impulse response and band-limited filters, allowing for the specification of constant-Q and auditory-inspired transforms. While LTFAT has originally been written in MATLAB/GNU Octave, the recent popularity of the Python programming language in related fields, such as signal processing and machine learning, makes it desirable to have LTFAT available in Python as well. We introduce LTFATPY, describe its main features, and outline further developments.
Download GRAFX: An Open-Source Library for Audio Processing Graphs in Pytorch
We present GRAFX, an open-source library designed for handling audio processing graphs in PyTorch. Along with various library functionalities, we describe technical details on the efficient parallel computation of input graphs, signals, and processor parameters in GPU. Then, we show its example use under a music mixing scenario, where parameters of every differentiable processor in a large graph are optimized via gradient descent. The code is available at https://github.com/sh-lee97/grafx.
Download QUBX: Rust Library for Queue-Based Multithreaded Real-Time Parallel Audio Streams Processing and Management
The concurrent management of real-time audio streams pose an increasingly complex technical challenge within the realm of the digital audio signals processing, necessitating efficient and intuitive solutions. Qubx endeavors to lead in tackling this obstacle with an architecture grounded in dynamic circular queues, tailored to optimize and synchronize the processing of parallel audio streams. It is a library written in Rust, a modern and powerful ecosystem with a still limited range of tools for digital signal processing and management. Additionally, Rust’s inherent security features and expressive type system bolster the resilience and stability of the proposed tool.
Download DDSP-Based Neural Waveform Synthesis of Polyphonic Guitar Performance From String-Wise MIDI Input
We explore the use of neural synthesis for acoustic guitar from string-wise MIDI input. We propose four different systems and compare them with both objective metrics and subjective evaluation against natural audio and a sample-based baseline. We iteratively develop these four systems by making various considerations on the architecture and intermediate tasks, such as predicting pitch and loudness control features. We find that formulating the control feature prediction task as a classification task rather than a regression task yields better results. Furthermore, we find that our simplest proposed system, which directly predicts synthesis parameters from MIDI input performs the best out of the four proposed systems. Audio examples and code are available.
Download Differentiable MIMO Feedback Delay Networks for Multichannel Room Impulse Response Modeling
Recently, with the advent of new performing headsets and goggles, the demand for Virtual and Augmented Reality applications has experienced a steep increase. In order to coherently navigate the virtual rooms, the acoustics of the scene must be emulated in the most accurate and efficient way possible. Amongst others, Feedback Delay Networks (FDNs) have proved to be valuable tools for tackling such a task. In this article, we expand and adapt a method recently proposed for the data-driven optimization of single-inputsingle-output FDNs to the multiple-input-multiple-output (MIMO) case for addressing spatial/space-time processing applications. By testing our methodology on items taken from two different datasets, we show that the parameters of MIMO FDNs can be jointly optimized to match some perceptual characteristics of given multichannel room impulse responses, overcoming approaches available in the literature, and paving the way toward increasingly efficient and accurate real-time virtual room acoustics rendering.
Download A Real-Time Approach for Estimating Pulse Tracking Parameters for Beat-Synchronous Audio Effects
Predominant Local Pulse (PLP) estimation, an established method for extracting beat positions and other periodic pulse information from audio signals, has recently been extended with an online variant tailored for real-time applications. In this paper, we introduce a novel approach to generating various real-time control signals from the original online PLP output. While the PLP activation function encodes both predominant pulse information and pulse stability, we propose several normalization procedures to discern local pulse oscillation from stability, utilizing the PLP activation envelope. Through this, we generate pulse-synchronous Low Frequency Oscillators (LFOs) and supplementary confidence-based control signals, enabling dynamic control over audio effect parameters in real-time. Additionally, our approach enables beat position prediction, providing a look-ahead capability, for example, to compensate for system latency. To showcase the effectiveness of our control signals, we introduce an audio plugin prototype designed for integration within a Digital Audio Workstation (DAW), facilitating real-time applications of beat-synchronous effects during live mixing and performances. Moreover, this plugin serves as an educational tool, providing insights into PLP principles and the tempo structure of analyzed music signals.