Download Real-Time Black-Box Modelling With Recurrent Neural Networks
This paper proposes to use a recurrent neural network for black-box modelling of nonlinear audio systems, such as tube amplifiers and distortion pedals. As a recurrent unit structure, we test both Long Short-Term Memory and a Gated Recurrent Unit. We compare the proposed neural network with a WaveNet-style deep neural network, which has been suggested previously for tube amplifier modelling. The neural networks are trained with several minutes of guitar and bass recordings, which have been passed through the devices to be modelled. A real-time audio plugin implementing the proposed networks has been developed in the JUCE framework. It is shown that the recurrent neural networks achieve similar accuracy to the WaveNet model, while requiring significantly less processing power to run. The Long Short-Term Memory recurrent unit is also found to outperform the Gated Recurrent Unit overall. The proposed neural network is an important step forward in computationally efficient yet accurate emulation of tube amplifiers and distortion pedals.
Download Improved Reverberation Time Control for Feedback Delay Networks
Artificial reverberation algorithms generally imitate the frequency-dependent decay of sound in a room quite inaccurately. Previous research suggests that a 5% error in the reverberation time (T60) can be audible. In this work, we propose to use an accurate graphic equalizer as the attenuation filter in a Feedback Delay Network reverberator. We use a modified octave graphic equalizer with a cascade structure and insert a high-shelf filter to control the gain at the high end of the audio range. One such equalizer is placed at the end of each delay line of the Feedback Delay Network. The gains of the equalizer are optimized using a new weighting function that acknowledges nonlinear error propagation from filter magnitude response to reverberation time values. Our experiments show that in real-world cases, the target T60 curve can be reproduced in a perceptually accurate manner at standard octave center frequencies. However, for an extreme test case in which the T60 varies dramatically between neighboring octave bands, the error still exceeds the limit of the just noticeable difference but is smaller than that obtained with previous methods. This work leads to more realistic artificial reverberation.
Download Neural Third-Octave Graphic Equalizer
This paper proposes to speed up the design of a third-order graphic equalizer by training a neural network to imitate its gain optimization. Instead of using the neural network to learn to design the graphic equalizer by optimizing its magnitude response, we present the network only with example command gains and the corresponding optimized gains, which are obtained with a previously proposed least-squares-based method. We presented this idea recently for the octave graphic equalizer with 10 band filters and extend it here to the third-octave case. Instead of a network with a single hidden layer, which we previously used, this task appears to require two hidden layers. This paper shows that good results can be reached with a neural network having 62 and 31 units in the first and the second hidden layer, respectively. After the training, the resulting network can quickly and accurately design a third-order graphic equalizer with a maximum error of 1.2 dB. The computing of the filter gains is over 350 times faster with the neural network than with the original optimization method. The method is easy to apply, and may thus lead to widespread use of accurate digital graphic equalizers.
Download Flexible Real-Time Reverberation Synthesis With Accurate Parameter Control
Reverberation is one of the most important effects used in audio production. Although nowadays numerous real-time implementations of artificial reverberation algorithms are available, many of them depend on a database of recorded or pre-synthesized room impulse responses, which are convolved with the input signal. Implementations that use an algorithmic approach are more flexible but do not let the users have full control over the produced sound, allowing only a few selected parameters to be altered. The realtime implementation of an artificial reverberation synthesizer presented in this study introduces an audio plugin based on a feedback delay network (FDN), which lets the user have full and detailed insight into the produced reverb. It allows for control of reverberation time in ten octave bands, simultaneously allowing adjusting the feedback matrix type and delay-line lengths. The proposed plugin explores various FDN setups, showing that the lowest useful order for high-quality sound is 16, and that in the case of a Householder matrix the implementation strongly affects the resulting reverberation. Experimenting with delay lengths and distribution demonstrates that choosing too wide or too narrow a length range is disadvantageous to the synthesized sound quality. The study also discusses CPU usage for different FDN orders and plugin states.
Download Virtual Bass System With Fuzzy Separation of Tones and Transients
A virtual bass system creates an impression of bass perception in sound systems with weak low-frequency reproduction, which is typical of small loudspeakers. Virtual bass systems extend the bandwidth of the low-frequency audio content using either a nonlinear function or a phase vocoder, and add the processed signal to the reproduced sound. Hybrid systems separate transients and steady-state sounds, which are processed separately. It is still challenging to reach a good sound quality using a virtual bass system. This paper proposes a novel method, which separates the tonal, transient, and noisy parts of the audio signal in a fuzzy way, and then processes only the transients and tones. Those upper harmonics, which can be detected above the cutoff frequency, are boosted using timbre-matched weights, but missing upper harmonics are generated to assist the missing fundamental phenomenon. Listening test results show that the proposed algorithm outperforms selected previous methods in terms of perceived bass sound quality. The proposed method can enhance the bass sound perception of small loudspeakers, such as those used in laptop computers and mobile devices.
Download Velvet-Noise Feedback Delay Network
Artificial reverberation is an audio effect used to simulate the acoustics of a space while controlling its aesthetics, particularly on sounds recorded in a dry studio environment. Delay-based methods are a family of artificial reverberators using recirculating delay lines to create this effect. The feedback delay network is a popular delay-based reverberator providing a comprehensive framework for parametric reverberation by formalizing the recirculation of a set of interconnected delay lines. However, one known limitation of this algorithm is the initial slow build-up of echoes, which can sound unrealistic, and overcoming this problem often requires adding more delay lines to the network. In this paper, we study the effect of adding velvet-noise filters, which have random sparse coefficients, at the input and output branches of the reverberator. The goal is to increase the echo density while minimizing the spectral coloration. We compare different variations of velvet-noise filtering and show their benefits. We demonstrate that with velvet noise, the echo density of a conventional feedback delay network can be exceeded using half the number of delay lines and saving over 50% of computing operations in a practical configuration using low-order attenuation filters.
Download Neural Modelling of Time-Varying Effects
This paper proposes a grey-box neural network based approach to modelling LFO modulated time-varying effects. The neural network model receives both the unprocessed audio, as well as the LFO signal, as input. This allows complete control over the model’s LFO frequency and shape. The neural networks are trained using guitar audio, which has to be processed by the target effect and also annotated with the predicted LFO signal before training. A measurement signal based on regularly spaced chirps was used to accurately predict the LFO signal. The model architecture has been previously shown to be capable of running in real-time on a modern desktop computer, whilst using relatively little processing power. We validate our approach creating models of both a phaser and a flanger effects pedal, and theoretically it can be applied to any LFO modulated time-varying effect. In the best case, an errorto-signal ratio of 1.3% is achieved when modelling a flanger pedal, and previous work has shown that this corresponds to the model being nearly indistinguishable from the target device.
Download Sitrano: A Matlab App for Sines-Transients-Noise Decomposition of Audio Signals
Decomposition of sounds into their sinusoidal, transient, and noise components is an active research topic and a widely-used tool in audio processing. Multiple solutions have been proposed in recent years, using time–frequency representations to identify either horizontal and vertical structures or orientations and anisotropy in the spectrogram of the sound. In this paper, we present SiTraNo: an easy-to-use MATLAB application with a graphic user interface for audio decomposition that enables visualization and access to the sinusoidal, transient, and noise classes, individually. This application allows the user to choose between different well-known separation methods to analyze an input sound file, to instantaneously control and remix its spectral components, and to visually check the quality of the separation, before producing the desired output file. The visualization of common artifacts, such as birdies and dropouts, is demonstrated. This application promotes experimenting with the sound decomposition process by observing the effect of variations for each spectral component on the original sound and by comparing different methods against each other, evaluating the separation quality both audibly and visually. SiTraNo and its source code are available on a companion website and repository.
Download One-to-Many Conversion for Percussive Samples
A filtering algorithm for generating subtle random variations in sampled sounds is proposed. Using only one recording for impact sound effects or drum machine sounds results in unrealistic repetitiveness during consecutive playback. This paper studies spectral variations in repeated knocking sounds and in three drum sounds: a hihat, a snare, and a tomtom. The proposed method uses a short pseudo-random velvet-noise filter and a low-shelf filter to produce timbral variations targeted at appropriate spectral regions, yielding potentially an endless number of new realistic versions of a single percussive sampled sound. The realism of the resulting processed sounds is studied in a listening test. The results show that the sound quality obtained with the proposed algorithm is at least as good as that of a previous method while using 77% fewer computational operations. The algorithm is widely applicable to computer-generated music and game audio.
Download Exposure Bias and State Matching in Recurrent Neural Network Virtual Analog Models
Virtual analog (VA) modeling using neural networks (NNs) has great potential for rapidly producing high-fidelity models. Recurrent neural networks (RNNs) are especially appealing for VA due to their connection with discrete nodal analysis. Furthermore, VA models based on NNs can be trained efficiently by directly exposing them to the circuit states in a gray-box fashion. However, exposure to ground truth information during training can leave the models susceptible to error accumulation in a free-running mode, also known as “exposure bias” in machine learning literature. This paper presents a unified framework for treating the previously proposed state trajectory network (STN) and gated recurrent unit (GRU) networks as special cases of discrete nodal analysis. We propose a novel circuit state-matching mechanism for the GRU and experimentally compare the previously mentioned networks for their performance in state matching, during training, and in exposure bias, during inference. Experimental results from modeling a diode clipper show that all the tested models exhibit some exposure bias, which can be mitigated by truncated backpropagation through time. Furthermore, the proposed state matching mechanism improves the GRU modeling performance of an overdrive pedal and a phaser pedal, especially in the presence of external modulation, apparent in a phaser circuit.