Download Grey-Box Modelling of Dynamic Range Compression This paper explores the digital emulation of analog dynamic range compressors, proposing a grey-box model that uses a combination of traditional signal processing techniques and machine learning. The main idea is to use the structure of a traditional digital compressor in a machine learning framework, so it can be trained end-to-end to create a virtual analog model of a compressor from data. The complexity of the model can be adjusted, allowing a trade-off between the model accuracy and computational cost. The proposed model has interpretable components, so its behaviour can be controlled more readily after training in comparison to a black-box model. The result is a model that achieves similar accuracy to a black-box baseline, whilst requiring less than 10% of the number of operations per sample at runtime.
Download Sample Rate Independent Recurrent Neural Networks for Audio Effects Processing In recent years, machine learning approaches to modelling guitar amplifiers and effects pedals have been widely investigated and have become standard practice in some consumer products. In particular, recurrent neural networks (RNNs) are a popular choice for modelling non-linear devices such as vacuum tube amplifiers and distortion circuitry. One limitation of such models is that they are trained on audio at a specific sample rate and therefore give unreliable results when operating at another rate. Here, we investigate several methods of modifying RNN structures to make them approximately sample rate independent, with a focus on oversampling. In the case of integer oversampling, we demonstrate that a previously proposed delay-based approach provides high fidelity sample rate conversion whilst additionally reducing aliasing. For non-integer sample rate adjustment, we propose two novel methods and show that one of these, based on cubic Lagrange interpolation of a delay-line, provides a significant improvement over existing methods. To our knowledge, this work provides the first in-depth study into this problem.
Download Differentiable Active Acoustics - Optimizing Stability via Gradient Descent Active acoustics (AA) refers to an electroacoustic system that actively modifies the acoustics of a room. For common use cases, the number of transducers—loudspeakers and microphones—involved in the system is large, resulting in a large number of system parameters. To optimally blend the response of the system into the natural acoustics of the room, the parameters require careful tuning, which is a time-consuming process performed by an expert. In this paper, we present a differentiable AA framework, which allows multi-objective optimization without impairing architecture flexibility. The system is implemented in PyTorch to be easily translated into a machine-learning pipeline, thus automating the tuning process. The objective of the pipeline is to optimize the digital signal processor (DSP) component to evenly distribute the energy in the feedback loop across frequencies. We investigate the effectiveness of DSPs composed of finite impulse response filters, which are unconstrained during the optimization. We study the effect of multiple filter orders, number of transducers, and loss functions on the performance. Different loss functions behave similarly for systems with few transducers and low-order filters. Increasing the number of transducers and the order of the filters improves results and accentuates the difference in the performance of the loss functions.
Download Exposure Bias and State Matching in Recurrent Neural Network Virtual Analog Models Virtual analog (VA) modeling using neural networks (NNs) has
great potential for rapidly producing high-fidelity models. Recurrent neural networks (RNNs) are especially appealing for VA due
to their connection with discrete nodal analysis. Furthermore, VA
models based on NNs can be trained efficiently by directly exposing them to the circuit states in a gray-box fashion. However,
exposure to ground truth information during training can leave the
models susceptible to error accumulation in a free-running mode,
also known as “exposure bias” in machine learning literature. This
paper presents a unified framework for treating the previously
proposed state trajectory network (STN) and gated recurrent unit
(GRU) networks as special cases of discrete nodal analysis. We
propose a novel circuit state-matching mechanism for the GRU
and experimentally compare the previously mentioned networks
for their performance in state matching, during training, and in exposure bias, during inference. Experimental results from modeling
a diode clipper show that all the tested models exhibit some exposure bias, which can be mitigated by truncated backpropagation
through time. Furthermore, the proposed state matching mechanism improves the GRU modeling performance of an overdrive
pedal and a phaser pedal, especially in the presence of external
modulation, apparent in a phaser circuit.
Download Guitar Tone Stack Modeling with a Neural State-Space Filter In this work, we present a data-driven approach to modeling tone stack circuits in guitar amplifiers and distortion pedals. To this aim, the proposed modeling approach uses a feedforward fully connected neural network to predict the parameters of a coupledform state-space filter, ensuring the numerical stability of the resulting time-varying system. The neural network is conditioned on the tone controls of the target tone stack and is optimized jointly with the coupled-form state-space filter to match the target frequency response. To assess the proposed approach, we model three popular tone stack schematics with both matched-order and overparameterized filters and conduct an objective comparison with well-established approaches that use cascaded biquad filters. Results from the conducted experiments demonstrate improved accuracy of the proposed modeling approach, especially in the case of over-parameterized state-space filters while guaranteeing numerical stability. Our method can be deployed, after training, in realtime audio processors.
Download Neural Modeling of Magnetic Tape Recorders The sound of magnetic recording media, such as open-reel and cassette tape recorders, is still sought after by today’s sound practitioners due to the imperfections embedded in the physics of the magnetic recording process. This paper proposes a method for digitally emulating this character using neural networks. The signal chain of the proposed system consists of three main components: the hysteretic nonlinearity and filtering jointly produced by the magnetic recording process as well as the record and playback amplifiers, the fluctuating delay originating from the tape transport, and the combined additive noise component from various electromagnetic origins. In our approach, the hysteretic nonlinear block is modeled using a recurrent neural network, while the delay trajectories and the noise component are generated using separate diffusion models, which employ U-net deep convolutional neural networks. According to the conducted objective evaluation, the proposed architecture faithfully captures the character of the magnetic tape recorder. The results of this study can be used to construct virtual replicas of vintage sound recording devices with applications in music production and audio antiquing tasks.
Download Granular analysis/synthesis of percussive drilling sounds This paper deals with the automatic and robust analysis, and the realistic and low-cost synthesis of percussive drilling like sounds. The two contributions are: a non-supervised removal of quasistationary background noise based on the Non-negative Matrix Factorization, and a granular method for analysis/synthesis of this drilling sounds. These two points are appropriate to the acoustical properties of percussive drilling sounds, and can be extended to other sounds with similar characteristics. The context of this work is the training of operators of working machines using simulators. Additionally, an implementation is explained.
Download Unsupervised Estimation of Nonlinear Audio Effects: Comparing Diffusion-Based and Adversarial Approaches Accurately estimating nonlinear audio effects without access to
paired input-output signals remains a challenging problem. This
work studies unsupervised probabilistic approaches for solving this
task. We introduce a method, novel for this application, based
on diffusion generative models for blind system identification, enabling the estimation of unknown nonlinear effects using blackand gray-box models. This study compares this method with a
previously proposed adversarial approach, analyzing the performance of both methods under different parameterizations of the
effect operator and varying lengths of available effected recordings. Through experiments on guitar distortion effects, we show
that the diffusion-based approach provides more stable results and
is less sensitive to data availability, while the adversarial approach
is superior at estimating more pronounced distortion effects. Our
findings contribute to the robust unsupervised blind estimation of
audio effects, demonstrating the potential of diffusion models for
system identification in music technology.
Download One-to-Many Conversion for Percussive Samples A filtering algorithm for generating subtle random variations in
sampled sounds is proposed. Using only one recording for impact
sound effects or drum machine sounds results in unrealistic repetitiveness during consecutive playback. This paper studies spectral
variations in repeated knocking sounds and in three drum sounds:
a hihat, a snare, and a tomtom. The proposed method uses a short
pseudo-random velvet-noise filter and a low-shelf filter to produce
timbral variations targeted at appropriate spectral regions, yielding potentially an endless number of new realistic versions of a
single percussive sampled sound.
The realism of the resulting
processed sounds is studied in a listening test. The results show
that the sound quality obtained with the proposed algorithm is at
least as good as that of a previous method while using 77% fewer
computational operations. The algorithm is widely applicable to
computer-generated music and game audio.
Download Differentiable Feedback Delay Network for Colorless Reverberation Artificial reverberation algorithms often suffer from spectral coloration, usually in the form of metallic ringing, which impairs the perceived quality of sound. This paper proposes a method to reduce the coloration in the feedback delay network (FDN), a popular artificial reverberation algorithm. An optimization framework is employed entailing a differentiable FDN to learn a set of parameters decreasing coloration. The optimization objective is to minimize the spectral loss to obtain a flat magnitude response, with an additional temporal loss term to control the sparseness of the impulse response. The objective evaluation of the method shows a favorable narrower distribution of modal excitation while retaining the impulse response density. The subjective evaluation demonstrates that the proposed method lowers perceptual coloration of late reverberation, and also shows that the suggested optimization improves sound quality for small FDN sizes. The method proposed in this work constitutes an improvement in the design of accurate and high-quality artificial reverberation, simultaneously offering computational savings.