Download Real-Time Black-Box Modelling With Recurrent Neural Networks This paper proposes to use a recurrent neural network for black-box modelling of nonlinear audio systems, such as tube amplifiers and distortion pedals. As a recurrent unit structure, we test both Long Short-Term Memory and a Gated Recurrent Unit. We compare the proposed neural network with a WaveNet-style deep neural network, which has been suggested previously for tube amplifier modelling. The neural networks are trained with several minutes of guitar and bass recordings, which have been passed through the devices to be modelled. A real-time audio plugin implementing the proposed networks has been developed in the JUCE framework. It is shown that the recurrent neural networks achieve similar accuracy to the WaveNet model, while requiring significantly less processing power to run. The Long Short-Term Memory recurrent unit is also found to outperform the Gated Recurrent Unit overall. The proposed neural network is an important step forward in computationally efficient yet accurate emulation of tube amplifiers and distortion pedals.
Download Improved Reverberation Time Control for Feedback Delay Networks Artificial reverberation algorithms generally imitate the frequency-dependent decay of sound in a room quite inaccurately. Previous research suggests that a 5% error in the reverberation time (T60) can be audible. In this work, we propose to use an accurate graphic equalizer as the attenuation filter in a Feedback Delay Network reverberator. We use a modified octave graphic equalizer with a cascade structure and insert a high-shelf filter to control the gain at the high end of the audio range. One such equalizer is placed at the end of each delay line of the Feedback Delay Network. The gains of the equalizer are optimized using a new weighting function that acknowledges nonlinear error propagation from filter magnitude response to reverberation time values. Our experiments show that in real-world cases, the target T60 curve can be reproduced in a perceptually accurate manner at standard octave center frequencies. However, for an extreme test case in which the T60 varies dramatically between neighboring octave bands, the error still exceeds the limit of the just noticeable difference but is smaller than that obtained with previous methods. This work leads to more realistic artificial reverberation.
Download Neural Third-Octave Graphic Equalizer This paper proposes to speed up the design of a third-order graphic equalizer by training a neural network to imitate its gain optimization. Instead of using the neural network to learn to design the graphic equalizer by optimizing its magnitude response, we present the network only with example command gains and the corresponding optimized gains, which are obtained with a previously proposed least-squares-based method. We presented this idea recently for the octave graphic equalizer with 10 band filters and extend it here to the third-octave case. Instead of a network with a single hidden layer, which we previously used, this task appears to require two hidden layers. This paper shows that good results can be reached with a neural network having 62 and 31 units in the first and the second hidden layer, respectively. After the training, the resulting network can quickly and accurately design a third-order graphic equalizer with a maximum error of 1.2 dB. The computing of the filter gains is over 350 times faster with the neural network than with the original optimization method. The method is easy to apply, and may thus lead to widespread use of accurate digital graphic equalizers.
Download Flexible Real-Time Reverberation Synthesis With Accurate Parameter Control Reverberation is one of the most important effects used in audio
production. Although nowadays numerous real-time implementations of artificial reverberation algorithms are available, many of
them depend on a database of recorded or pre-synthesized room
impulse responses, which are convolved with the input signal. Implementations that use an algorithmic approach are more flexible
but do not let the users have full control over the produced sound,
allowing only a few selected parameters to be altered. The realtime implementation of an artificial reverberation synthesizer presented in this study introduces an audio plugin based on a feedback delay network (FDN), which lets the user have full and detailed insight into the produced reverb. It allows for control of
reverberation time in ten octave bands, simultaneously allowing
adjusting the feedback matrix type and delay-line lengths. The
proposed plugin explores various FDN setups, showing that the
lowest useful order for high-quality sound is 16, and that in the
case of a Householder matrix the implementation strongly affects
the resulting reverberation. Experimenting with delay lengths and
distribution demonstrates that choosing too wide or too narrow a
length range is disadvantageous to the synthesized sound quality.
The study also discusses CPU usage for different FDN orders and
plugin states.
Download Virtual Bass System With Fuzzy Separation of Tones and Transients A virtual bass system creates an impression of bass perception
in sound systems with weak low-frequency reproduction, which
is typical of small loudspeakers. Virtual bass systems extend the
bandwidth of the low-frequency audio content using either a nonlinear function or a phase vocoder, and add the processed signal
to the reproduced sound. Hybrid systems separate transients and
steady-state sounds, which are processed separately. It is still challenging to reach a good sound quality using a virtual bass system.
This paper proposes a novel method, which separates the tonal,
transient, and noisy parts of the audio signal in a fuzzy way, and
then processes only the transients and tones. Those upper harmonics, which can be detected above the cutoff frequency, are boosted
using timbre-matched weights, but missing upper harmonics are
generated to assist the missing fundamental phenomenon. Listening test results show that the proposed algorithm outperforms selected previous methods in terms of perceived bass sound quality.
The proposed method can enhance the bass sound perception of
small loudspeakers, such as those used in laptop computers and
mobile devices.
Download Velvet-Noise Feedback Delay Network Artificial reverberation is an audio effect used to simulate the acoustics of a space while controlling its aesthetics, particularly on sounds
recorded in a dry studio environment. Delay-based methods are
a family of artificial reverberators using recirculating delay lines
to create this effect.
The feedback delay network is a popular
delay-based reverberator providing a comprehensive framework
for parametric reverberation by formalizing the recirculation of
a set of interconnected delay lines. However, one known limitation of this algorithm is the initial slow build-up of echoes, which
can sound unrealistic, and overcoming this problem often requires
adding more delay lines to the network. In this paper, we study the
effect of adding velvet-noise filters, which have random sparse coefficients, at the input and output branches of the reverberator. The
goal is to increase the echo density while minimizing the spectral coloration. We compare different variations of velvet-noise
filtering and show their benefits. We demonstrate that with velvet
noise, the echo density of a conventional feedback delay network
can be exceeded using half the number of delay lines and saving
over 50% of computing operations in a practical configuration using low-order attenuation filters.
Download Neural Modelling of Time-Varying Effects This paper proposes a grey-box neural network based approach
to modelling LFO modulated time-varying effects.
The neural
network model receives both the unprocessed audio, as well as
the LFO signal, as input. This allows complete control over the
model’s LFO frequency and shape. The neural networks are trained
using guitar audio, which has to be processed by the target effect
and also annotated with the predicted LFO signal before training.
A measurement signal based on regularly spaced chirps was used
to accurately predict the LFO signal. The model architecture has
been previously shown to be capable of running in real-time on a
modern desktop computer, whilst using relatively little processing
power. We validate our approach creating models of both a phaser
and a flanger effects pedal, and theoretically it can be applied to
any LFO modulated time-varying effect. In the best case, an errorto-signal ratio of 1.3% is achieved when modelling a flanger pedal,
and previous work has shown that this corresponds to the model
being nearly indistinguishable from the target device.
Download Sitrano: A Matlab App for Sines-Transients-Noise Decomposition of Audio Signals Decomposition of sounds into their sinusoidal, transient, and noise
components is an active research topic and a widely-used tool in
audio processing. Multiple solutions have been proposed in recent
years, using time–frequency representations to identify either horizontal and vertical structures or orientations and anisotropy in the
spectrogram of the sound. In this paper, we present SiTraNo: an
easy-to-use MATLAB application with a graphic user interface for
audio decomposition that enables visualization and access to the
sinusoidal, transient, and noise classes, individually. This application allows the user to choose between different well-known separation methods to analyze an input sound file, to instantaneously
control and remix its spectral components, and to visually check
the quality of the separation, before producing the desired output
file. The visualization of common artifacts, such as birdies and
dropouts, is demonstrated. This application promotes experimenting with the sound decomposition process by observing the effect
of variations for each spectral component on the original sound
and by comparing different methods against each other, evaluating
the separation quality both audibly and visually. SiTraNo and its
source code are available on a companion website and repository.
Download One-to-Many Conversion for Percussive Samples A filtering algorithm for generating subtle random variations in
sampled sounds is proposed. Using only one recording for impact
sound effects or drum machine sounds results in unrealistic repetitiveness during consecutive playback. This paper studies spectral
variations in repeated knocking sounds and in three drum sounds:
a hihat, a snare, and a tomtom. The proposed method uses a short
pseudo-random velvet-noise filter and a low-shelf filter to produce
timbral variations targeted at appropriate spectral regions, yielding potentially an endless number of new realistic versions of a
single percussive sampled sound.
The realism of the resulting
processed sounds is studied in a listening test. The results show
that the sound quality obtained with the proposed algorithm is at
least as good as that of a previous method while using 77% fewer
computational operations. The algorithm is widely applicable to
computer-generated music and game audio.
Download Exposure Bias and State Matching in Recurrent Neural Network Virtual Analog Models Virtual analog (VA) modeling using neural networks (NNs) has
great potential for rapidly producing high-fidelity models. Recurrent neural networks (RNNs) are especially appealing for VA due
to their connection with discrete nodal analysis. Furthermore, VA
models based on NNs can be trained efficiently by directly exposing them to the circuit states in a gray-box fashion. However,
exposure to ground truth information during training can leave the
models susceptible to error accumulation in a free-running mode,
also known as “exposure bias” in machine learning literature. This
paper presents a unified framework for treating the previously
proposed state trajectory network (STN) and gated recurrent unit
(GRU) networks as special cases of discrete nodal analysis. We
propose a novel circuit state-matching mechanism for the GRU
and experimentally compare the previously mentioned networks
for their performance in state matching, during training, and in exposure bias, during inference. Experimental results from modeling
a diode clipper show that all the tested models exhibit some exposure bias, which can be mitigated by truncated backpropagation
through time. Furthermore, the proposed state matching mechanism improves the GRU modeling performance of an overdrive
pedal and a phaser pedal, especially in the presence of external
modulation, apparent in a phaser circuit.