Download Real-Time Black-Box Modelling With Recurrent Neural Networks This paper proposes to use a recurrent neural network for black-box modelling of nonlinear audio systems, such as tube amplifiers and distortion pedals. As a recurrent unit structure, we test both Long Short-Term Memory and a Gated Recurrent Unit. We compare the proposed neural network with a WaveNet-style deep neural network, which has been suggested previously for tube amplifier modelling. The neural networks are trained with several minutes of guitar and bass recordings, which have been passed through the devices to be modelled. A real-time audio plugin implementing the proposed networks has been developed in the JUCE framework. It is shown that the recurrent neural networks achieve similar accuracy to the WaveNet model, while requiring significantly less processing power to run. The Long Short-Term Memory recurrent unit is also found to outperform the Gated Recurrent Unit overall. The proposed neural network is an important step forward in computationally efficient yet accurate emulation of tube amplifiers and distortion pedals.
Download Neural Net Tube Models for Wave Digital Filters Herein, we demonstrate the use of neural nets towards simulating multiport nonlinearities inside a wave digital filter. We introduce a resolved wave definition which allows us to extract features from a Kirchhoff domain dataset and train our neural networks directly in the wave domain. A hyperparameter search is performed to minimize error and runtime complexity. To illustrate the method, we model a tube amplifier circuit inspired by the preamplifier stage of the Fender Pro-Junior guitar amplifier. We analyze the performance of our neural nets models by comparing their distortion characteristics and transconductances. Our results suggest that activation function selection has a significant effect on the distortion characteristic created by the neural net.
Download Neural Parametric Equalizer Matching Using Differentiable Biquads This paper proposes a neural network for carrying out parametric equalizer (EQ) matching. The novelty of this neural network
solution is that it can be optimized directly in the frequency domain by means of differentiable biquads, rather than relying solely
on a loss on parameter values which does not correlate directly
with the system output. We compare the performance of the proposed neural network approach with that of a baseline algorithm
based on a convex relaxation of the problem. It is observed that the
neural network can provide better matching than the baseline approach because it directly attempts to solve the non-convex problem. Moreover, we show that the same network trained with only
a parameter loss is insufficient for the task, despite the fact that it
matches underlying EQ parameters better than one trained with a
combination of spectral and parameter losses.
Download Neural Audio Processing on Android Phones This study investigates the potential of real-time inference of neural audio effects on Android smartphones, marking an initial step towards bridging the gap in neural audio processing for mobile devices. Focusing exclusively on processing rather than synthesis, we explore the performance of three open-source neural models across five Android phones released between 2014 and 2022, showcasing varied capabilities due to their generational differences. Through comparative analysis utilizing two C++ inference engines (ONNX Runtime and RTNeural), we aim to evaluate the computational efficiency and timing performance of these models, considering the varying computational loads and the hardware specifics of each device. Our work contributes insights into the feasibility of implementing neural audio processing in real-time on mobile platforms, highlighting challenges and opportunities for future advancements in this rapidly evolving field.
Download Speech Dereverberation Using Recurrent Neural Networks Advances in deep learning have led to novel, state-of-the-art techniques for blind source separation, particularly for the application of non-stationary noise removal from speech. In this paper, we show how a simple reformulation allows us to adapt blind source separation techniques to the problem of speech dereverberation and, accordingly, train a bidirectional recurrent neural network (BRNN) for this task. We compare the performance of the proposed neural network approach with that of a baseline dereverberation algorithm based on spectral subtraction. We find that our trained neural network quantitatively and qualitatively outperforms the baseline approach.
Download Wave Digital Modeling of Circuits with Multiple One-Port Nonlinearities Based on Lipschitz-Bounded Neural Networks Neural networks have found application within the Wave Digital Filters (WDFs) framework as data-driven input-output blocks for modeling single one-port or multi-port nonlinear devices in circuit systems. However, traditional neural networks lack predictable bounds for their output derivatives, essential to ensure convergence when simulating circuits with multiple nonlinear elements using fixed-point iterative methods, e.g., the Scattering Iterative Method (SIM). In this study, we address such issue by employing Lipschitz-bounded neural networks for regressing nonlinear WD scattering relations of one-port nonlinearities.
Download Decoding Sound Source Location From EEG: Preliminary Comparisons of Spatial Rendering and Location Spatial auditory acuity is contingent on the quality of spatial cues presented during listening. Electroencephalography (EEG) shows promise for finding neural markers of such acuity present in recorded neural activity, potentially mitigating common challenges with behavioural assessment (e.g., sound source localisation tasks). This study presents findings from three preliminary experiments which investigated neural response variations to auditory stimuli under different spatial listening conditions: free-field (loudspeakerbased), individual Head-Related Transfer-Functions (HRTF), and non-individual HRTFs. Three participants, each participating in one experiment, were exposed to auditory stimuli from various spatial locations while neural activity was recorded via EEG. The resultant neural responses underwent a decoding protocol to asses how decoding accuracy varied between stimuli locations over time. Decoding accuracy was highest for free-field auditory stimuli, with significant but lower decoding accuracy between left and right hemisphere locations for individual and non-individual HRTF stimuli. A latency in significant decoding accuracy was observed between listening conditions for locations dominated by spectral cues. Furthermore, findings suggest that decoding accuracy between free-field and non-individual HRTF stimuli may reflect behavioural front-back confusion rates.
Download Fast Temporal Convolutions for Real-Time Audio Signal Processing This paper introduces the possibilities of optimizing neural network convolutional layers for modeling nonlinear audio systems and effects. Enhanced methods for real-time dilated convolutions are presented to achieve faster signal processing times than in previous work. Due to the improved implementation of convolutional layers, a significant decrease in computational requirements was observed and validated on different configurations of single layers with dilated convolutions and WaveNet-style feedforward neural network models. In most cases, equivalent signal processing times were achieved to those using recurrent neural networks with Long Short-Term Memory units and Gated Recurrent Units, which are considered state-of-the-art in the field of black-box virtual analog modeling.
Download Aliasing Reduction in Neural Amp Modeling by Smoothing Activations The increasing demand for high-quality digital emulations of analog audio hardware, such as vintage tube guitar amplifiers, led
to numerous works on neural network-based black-box modeling,
with deep learning architectures like WaveNet showing promising
results. However, a key limitation in all of these models was the
aliasing artifacts stemming from nonlinear activation functions in
neural networks. In this paper, we investigated novel and modified activation functions aimed at mitigating aliasing within neural
amplifier models. Supporting this, we introduced a novel metric,
the Aliasing-to-Signal Ratio (ASR), which quantitatively assesses
the level of aliasing with high accuracy. Measuring also the conventional Error-to-Signal Ratio (ESR), we conducted studies on a
range of preexisting and modern activation functions with varying
stretch factors. Our findings confirmed that activation functions
with smoother curves tend to achieve lower ASR values, indicating a noticeable reduction in aliasing. Notably, this improvement
in aliasing reduction was achievable without a substantial increase
in ESR, demonstrating the potential for high modeling accuracy
with reduced aliasing in neural amp models.
Download Inference-Time Structured Pruning for Real-Time Neural Network Audio Effects Structured pruning is a technique for reducing the computational
load and memory footprint of neural networks by removing structured subsets of parameters according to a predefined schedule
or ranking criterion.
This paper investigates the application of
structured pruning to real-time neural network audio effects, focusing on both feedforward networks and recurrent architectures.
We evaluate multiple pruning strategies at inference time, without retraining, and analyze their effects on model performance. To
quantify the trade-off between parameter count and audio fidelity,
we construct a theoretical model of the approximation error as a
function of network architecture and pruning level. The resulting bounds establish a principled relationship between pruninginduced sparsity and functional error, enabling informed deployment of neural audio effects in constrained real-time environments.