Download Audio Effect Chain Estimation and Dry Signal Recovery From Multi-Effect-Processed Musical Signals
In this paper we propose a method that can address a novel task, audio effect (AFX) chain estimation and dry signal recovery. AFXs are indispensable in modern sound design workflows. Sound engineers often cascade different AFXs (as an AFX chain) to achieve their desired soundscapes. Given a multi-AFX-applied solo instrument performance (wet signal), our method can automatically estimate the applied AFX chain and recover its unprocessed dry signal, while previous research only addresses one of them. The estimated chain is useful for novice engineers in learning practical usages of AFXs, and the recovered signal can be reused with a different AFX chain. To solve this task, we first develop a deep neural network model that estimates the last-applied AFX and undoes its AFX at a time. We then iteratively apply the same model to estimate the AFX chain and eventually recover the dry signal from the wet signal. Our experiments on guitar phrase recordings with various AFX chains demonstrate the validity of our method for both the AFX-chain estimation and dry signal recovery. We also confirm that the input wet signal can be reproduced by applying the estimated AFX chain to the recovered dry signal.
Download Time-Varying Filter In Non-Uniform Block Convolution
This paper will describe further research on a real-time convolution algorithm for long a FIR filter based on nonuniform bock partitioning. The static behaviour of the algorithm which solves the dilemma between the computational load and the latency of the processing operation is well examined in literature. New directions are investigated to exploit the inherent features of the algorithm and utilise them for audio applications. Especially a dynamic exchange of filter coefficients or subsets of them of a room impulse response is discussed and implemented. Unlike to traditional DSP solutions the prototype is realised in portable software objects and components that can be compiled on multi-propose processing units like off-the-shelf computers with standard audio facilities and different operating systems. Keywords : convolution, spatial sound processing, real-time, room acoustics, sonification
Download CONMOD: Controllable Neural Frame-Based Modulation Effects
Deep learning models have seen widespread use in modelling LFOdriven audio effects, such as phaser and flanger. Although existing neural architectures exhibit high-quality emulation of individual effects, they do not possess the capability to manipulate the output via control parameters. To address this issue, we introduce Controllable Neural Frame-based Modulation Effects (CONMOD), a single black-box model which emulates various LFOdriven effects in a frame-wise manner, offering control over LFO frequency and feedback parameters. Additionally, the model is capable of learning the continuous embedding space of two distinct phaser effects, enabling us to steer between effects and achieve creative outputs. Our model outperforms previous work while possessing both controllability and universality, presenting opportunities to enhance creativity in modern LFO-driven audio effects. Additional demo of our model is available in the accompanying website.1
Download A general-purpose deep learning approach to model time-varying audio effects
Audio processors whose parameters are modified periodically over time are often referred as time-varying or modulation based audio effects. Most existing methods for modeling these type of effect units are often optimized to a very specific circuit and cannot be efficiently generalized to other time-varying effects. Based on convolutional and recurrent neural networks, we propose a deep learning architecture for generic black-box modeling of audio processors with long-term memory. We explore the capabilities of deep neural networks to learn such long temporal dependencies and we show the network modeling various linear and nonlinear, time-varying and time-invariant audio effects. In order to measure the performance of the model, we propose an objective metric based on the psychoacoustics of modulation frequency perception. We also analyze what the model is actually learning and how the given task is accomplished.
Download Analysis / Synthesis of Rolling Sounds Using a Source Filter Approach
In this paper, the analysis and synthesis of a rolling ball sound is proposed. The approach is based on the assumption that the rolling sound is generated by a concatenation of micro-impacts between a ball and a surface, each having associated resonances. Contact timing information is first extracted from the rolling sound using an onset detection process. The resulting individual contact segments are subband filtered before being analyzed using linear predictive coding (LPC) and notch filter parameter estimation. The segments are then resynthesized and overlap-added to form a complete rolling sound. This approach is similar to that of [1], though the methods used for contact event detection and filter parameter estimation are completely different.
Download Towards an Objective Comparison of Panning Feature Algorithms for Unsupervised Learning
Estimations of panning attributes are an important feature to extract from a piece of recorded music, with downstream uses such as classification, quality assessment, and listening enhancement. While several algorithms exist in the literature, there is currently no comparison between them and no studies to suggest which one is most suitable for any particular task. This paper compares four algorithms for extracting amplitude panning features with respect to their suitability for unsupervised learning. It finds synchronicities between them and analyses their results on a small set of commercial music excerpts chosen for their distinct panning features. The ability of each algorithm to differentiate between the tracks is analysed. The results can be used in future work to either select the most appropriate panning feature algorithm or create a version customized for a particular task.
Download Hybrid Reverberation Processor with Perceptual Control
This paper presents a hybrid reverberation processor, i.e. a realtime audio signal processing unit that combines a convolution reverb for recreating the early reflections of a measured impulse response (IR) with a feedback delay network (FDN) for synthesizing the reverberation tail. The FDN is automatically adjusted so as to match the energy decay profile of the measured IR. Particular attention is given to the transition between the convolution section and the FDN in order to avoid audible artifacts. The proposed reverberation processor offers both computational efficiency and flexible perceptual control over the reverberation effect.
Download Differentiable All-Pole Filters for Time-Varying Audio Systems
Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers. However, their recursive structure impedes end-toend training of these systems using automatic differentiation. Although non-recursive filter approximations like frequency sampling and frame-based processing have been proposed and widely used in previous works, they cannot accurately reflect the gradient of the original system. We alleviate this difficulty by reexpressing a time-varying all-pole filter to backpropagate the gradients through itself, so the filter implementation is not bound to the technical limitations of automatic differentiation frameworks. This implementation can be employed within audio systems containing filters with poles for efficient gradient evaluation. We demonstrate its training efficiency and expressive capabilities for modelling real-world dynamic audio systems on a phaser, time-varying subtractive synthesiser, and feed-forward compressor. We make our code and audio samples available and provide the trained audio effect and synth models in a VST plugin1 .
Download Amp-Space: A Large-Scale Dataset for Fine-Grained Timbre Transformation
We release Amp-Space, a large-scale dataset of paired audio samples: a source audio signal, and an output signal, the result of a timbre transformation. The types of transformations we study are from blackbox musical tools (amplifiers, stompboxes, studio effects) traditionally used to shape the sound of guitar, bass, or synthesizer sounds. For each sample of transformed audio, the set of parameters used to create it are given. Samples are from both real and simulated devices, the latter allowing for orders of magnitude greater data than found in comparable datasets. We demonstrate potential use cases of this data by (a) pre-training a conditional WaveNet model on synthetic data and show that it reduces the number of samples necessary to digitally reproduce a real musical device, and (b) training a variational autoencoder to shape a continuous space of timbre transformations for creating new sounds through interpolation.
Download The Shape of RemiXXXes to Come: Audio Texture Synthesis with Time-frequency Scattering
This article explains how to apply time–frequency scattering, a convolutional operator extracting modulations in the time–frequency domain at different rates and scales, to the re-synthesis and manipulation of audio textures. After implementing phase retrieval in the scattering network by gradient backpropagation, we introduce scale-rate DAFx, a class of audio transformations expressed in the domain of time–frequency scattering coefficients. One example of scale-rate DAFx is chirp rate inversion, which causes each sonic event to be locally reversed in time while leaving the arrow of time globally unchanged. Over the past two years, our work has led to the creation of four electroacoustic pieces: FAVN; Modulator (Scattering Transform); Experimental Palimpsest; Inspection (Maida Vale Project) and Inspection II; as well as XAllegroX (Hecker Scattering.m Sequence), a remix of Lorenzo Senni’s XAllegroX, released by Warp Records on a vinyl entitled The Shape of RemiXXXes to Come.