Download Learning Nonlinear Dynamics in Physical Modelling Synthesis Using Neural Ordinary Differential Equations Modal synthesis methods are a long-standing approach for modelling distributed musical systems. In some cases extensions are
possible in order to handle geometric nonlinearities. One such
case is the high-amplitude vibration of a string, where geometric nonlinear effects lead to perceptually important effects including pitch glides and a dependence of brightness on striking amplitude. A modal decomposition leads to a coupled nonlinear system of ordinary differential equations. Recent work in applied machine learning approaches (in particular neural ordinary differential equations) has been used to model lumped dynamic systems
such as electronic circuits automatically from data. In this work,
we examine how modal decomposition can be combined with neural ordinary differential equations for modelling distributed musical systems. The proposed model leverages the analytical solution
for linear vibration of system’s modes and employs a neural network to account for nonlinear dynamic behaviour. Physical parameters of a system remain easily accessible after the training without
the need for a parameter encoder in the network architecture. As
an initial proof of concept, we generate synthetic data for a nonlinear transverse string and show that the model can be trained to
reproduce the nonlinear dynamics of the system. Sound examples
are presented.
Download Anti-Aliasing of Neural Distortion Effects via Model Fine Tuning Neural networks have become ubiquitous with guitar distortion
effects modelling in recent years. Despite their ability to yield
perceptually convincing models, they are susceptible to frequency
aliasing when driven by high frequency and high gain inputs.
Nonlinear activation functions create both the desired harmonic
distortion and unwanted aliasing distortion as the bandwidth of
the signal is expanded beyond the Nyquist frequency. Here, we
present a method for reducing aliasing in neural models via a
teacher-student fine tuning approach, where the teacher is a pretrained model with its weights frozen, and the student is a copy of
this with learnable parameters. The student is fine-tuned against
an aliasing-free dataset generated by passing sinusoids through
the original model and removing non-harmonic components from
the output spectra.
Our results show that this method significantly suppresses aliasing for both long-short-term-memory networks (LSTM) and temporal convolutional networks (TCN). In the
majority of our case studies, the reduction in aliasing was greater
than that achieved by two times oversampling. One side-effect
of the proposed method is that harmonic distortion components
are also affected.
This adverse effect was found to be modeldependent, with the LSTM models giving the best balance between
anti-aliasing and preserving the perceived similarity to an analog
reference device.
Download Sample Rate Independent Recurrent Neural Networks for Audio Effects Processing In recent years, machine learning approaches to modelling guitar amplifiers and effects pedals have been widely investigated and have become standard practice in some consumer products. In particular, recurrent neural networks (RNNs) are a popular choice for modelling non-linear devices such as vacuum tube amplifiers and distortion circuitry. One limitation of such models is that they are trained on audio at a specific sample rate and therefore give unreliable results when operating at another rate. Here, we investigate several methods of modifying RNN structures to make them approximately sample rate independent, with a focus on oversampling. In the case of integer oversampling, we demonstrate that a previously proposed delay-based approach provides high fidelity sample rate conversion whilst additionally reducing aliasing. For non-integer sample rate adjustment, we propose two novel methods and show that one of these, based on cubic Lagrange interpolation of a delay-line, provides a significant improvement over existing methods. To our knowledge, this work provides the first in-depth study into this problem.
Download Differentiable All-Pole Filters for Time-Varying Audio Systems Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers. However, their recursive structure impedes end-toend training of these systems using automatic differentiation. Although non-recursive filter approximations like frequency sampling and frame-based processing have been proposed and widely used in previous works, they cannot accurately reflect the gradient of the original system. We alleviate this difficulty by reexpressing a time-varying all-pole filter to backpropagate the gradients through itself, so the filter implementation is not bound to the technical limitations of automatic differentiation frameworks. This implementation can be employed within audio systems containing filters with poles for efficient gradient evaluation. We demonstrate its training efficiency and expressive capabilities for modelling real-world dynamic audio systems on a phaser, time-varying subtractive synthesiser, and feed-forward compressor. We make our code and audio samples available and provide the trained audio effect and synth models in a VST plugin1 .
Download Differentiable grey-box modelling of phaser effects using frame-based spectral processing Machine learning approaches to modelling analog audio effects have seen intensive investigation in recent years, particularly in the context of non-linear time-invariant effects such as guitar amplifiers. For modulation effects such as phasers, however, new challenges emerge due to the presence of the low-frequency oscillator which controls the slowly time-varying nature of the effect. Existing approaches have either required foreknowledge of this control signal, or have been non-causal in implementation. This work presents a differentiable digital signal processing approach to modelling phaser effects in which the underlying control signal and time-varying spectral response of the effect are jointly learned. The proposed model processes audio in short frames to implement a time-varying filter in the frequency domain, with a transfer function based on typical analog phaser circuit topology. We show that the model can be trained to emulate an analog reference device, while retaining interpretable and adjustable parameters. The frame duration is an important hyper-parameter of the proposed model, so an investigation was carried out into its effect on model accuracy. The optimal frame length depends on both the rate and transient decay-time of the target effect, but the frame length can be altered at inference time without a significant change in accuracy.
Download Non-Iterative Schemes for the Simulation of Nonlinear Audio Circuits In this work, a number of numerical schemes are presented in the
context of virtual-analog simulation. The schemes are linearlyimplicit in character, and hence directly solvable without iterative
methods. Schemes of increasing order of accuracy are constructed,
and convergence and stability conditions are proven formally. The
schemes are able to handle stiff problems very efficiently, because
of their fast update, and can be run at higher sample rates to reduce
aliasing. The cases of the diode clipper and ring modulator are
investigated in detail, including several numerical examples.
Download Zero-Phase Sound via Giant FFT Given the speedy computation of the FFT in current computer
hardware, there are new possibilities for examining transformations for very long sounds. A zero-phase version of any audio
signal can be obtained by zeroing the phase angle of its complex
spectrum and taking the inverse FFT. This paper recommends additional processing steps, including zero-padding, transient suppression at the signal’s start and end, and gain compensation, to
enhance the resulting sound quality. As a result, a sound with the
same spectral characteristics as the original one, but with different temporal events, is obtained. Repeating rhythm patterns are
retained, however. Zero-phase sounds are palindromic in the sense
that they are symmetric in time. A comparison of the zero-phase
conversion to the autocorrelation function helps to understand its
properties, such as why the rhythm of the original sound is emphasized. It is also argued that the zero-phase signal has the same
autocorrelation function as the original sound. One exciting variation of the method is to apply the method separately to the real
and imaginary parts of the spectrum to produce a stereo effect. A
frame-based technique enables the use of the zero-phase conversion in real-time audio processing. The zero-phase conversion is
another member of the giant FFT toolset, allowing the modification of sampled sounds, such as drum loops or entire songs.
Download Real-Time Modal Synthesis of Nonlinearly Interconnected Networks Modal methods are a long-established approach to physical modeling sound synthesis. Projecting the equation of motion of a linear, time-invariant system onto a basis of eigenfunctions yields a set of independent forced, lossy oscillators, which may be simulated efficiently and accurately by means of standard time-stepping methods. Extensions of modal techniques to nonlinear problems are possible, though often requiring the solution of densely coupled nonlinear time-dependent equations. Here, an application of recent results in numerical simulation design is employed, in which the nonlinear energy is first quadratised via a convenient auxiliary variable. The resulting equations may be updated in time explicitly, thus avoiding the need for expensive iterative solvers, dense linear system solutions, or matrix inversions. The case of a network of interconnected distributed elements is detailed, along with a real-time implementation as an audio plugin.