Download Neural Third-Octave Graphic Equalizer
This paper proposes to speed up the design of a third-order graphic equalizer by training a neural network to imitate its gain optimization. Instead of using the neural network to learn to design the graphic equalizer by optimizing its magnitude response, we present the network only with example command gains and the corresponding optimized gains, which are obtained with a previously proposed least-squares-based method. We presented this idea recently for the octave graphic equalizer with 10 band filters and extend it here to the third-octave case. Instead of a network with a single hidden layer, which we previously used, this task appears to require two hidden layers. This paper shows that good results can be reached with a neural network having 62 and 31 units in the first and the second hidden layer, respectively. After the training, the resulting network can quickly and accurately design a third-order graphic equalizer with a maximum error of 1.2 dB. The computing of the filter gains is over 350 times faster with the neural network than with the original optimization method. The method is easy to apply, and may thus lead to widespread use of accurate digital graphic equalizers.
Download Neural Music Instrument Cloning From Few Samples
Neural music instrument cloning is an application of deep neural networks for imitating the timbre of a particular music instrument recording with a trained neural network. One can create such clones using an approach such as DDSP [1], which has been shown to achieve good synthesis quality for several instrument types [2]. However, this approach needs about ten minutes of audio data from the instrument of interest (target recording audio). In this work, we modify the DDSP architecture and apply transfer learning techniques used in speech voice cloning [3] to significantly reduce the amount of target recording audio required. We compare various cloning approaches and architectures across durations of target recording audio, ranging from four to 256 seconds. We demonstrate editing of loudness and pitch as well as timbre transfer from only 16 seconds of target recording audio. Our code is available online1 as well as many audio examples.2
Download Neural Modelling of Time-Varying Effects
This paper proposes a grey-box neural network based approach to modelling LFO modulated time-varying effects. The neural network model receives both the unprocessed audio, as well as the LFO signal, as input. This allows complete control over the model’s LFO frequency and shape. The neural networks are trained using guitar audio, which has to be processed by the target effect and also annotated with the predicted LFO signal before training. A measurement signal based on regularly spaced chirps was used to accurately predict the LFO signal. The model architecture has been previously shown to be capable of running in real-time on a modern desktop computer, whilst using relatively little processing power. We validate our approach creating models of both a phaser and a flanger effects pedal, and theoretically it can be applied to any LFO modulated time-varying effect. In the best case, an errorto-signal ratio of 1.3% is achieved when modelling a flanger pedal, and previous work has shown that this corresponds to the model being nearly indistinguishable from the target device.
Download Neural Modeling of Magnetic Tape Recorders
The sound of magnetic recording media, such as open-reel and cassette tape recorders, is still sought after by today’s sound practitioners due to the imperfections embedded in the physics of the magnetic recording process. This paper proposes a method for digitally emulating this character using neural networks. The signal chain of the proposed system consists of three main components: the hysteretic nonlinearity and filtering jointly produced by the magnetic recording process as well as the record and playback amplifiers, the fluctuating delay originating from the tape transport, and the combined additive noise component from various electromagnetic origins. In our approach, the hysteretic nonlinear block is modeled using a recurrent neural network, while the delay trajectories and the noise component are generated using separate diffusion models, which employ U-net deep convolutional neural networks. According to the conducted objective evaluation, the proposed architecture faithfully captures the character of the magnetic tape recorder. The results of this study can be used to construct virtual replicas of vintage sound recording devices with applications in music production and audio antiquing tasks.
Download Training Neural Models of Nonlinear Multi-Port Elements Within Wave Digital Structures Through Discrete-Time Simulation
Neural networks have been applied within the Wave Digital Filter (WDF) framework as data-driven models for nonlinear multi-port circuit elements. Conventionally, these models are trained on wave variables obtained by sampling the current-voltage characteristic of the considered nonlinear element before being incorporated into the circuit WDF implementation. However, isolating multi-port elements for this process can be challenging, as their nonlinear behavior often depends on dynamic effects that emerge from interactions with the surrounding circuit. In this paper, we propose a novel approach for training neural models of nonlinear multi-port elements directly within a circuit’s Wave Digital (WD) discretetime implementation, relying solely on circuit input-output voltage measurements. Exploiting the differentiability of WD simulations, we embed the neural network into the simulation process and optimize its parameters using gradient-based methods by minimizing a loss function defined over the circuit output voltage. Experimental results demonstrate the effectiveness of the proposed approach in accurately capturing the nonlinear circuit behavior, while preserving the interpretability and modularity of WDFs.
Download Antiderivative Antialiasing for Recurrent Neural Networks
Neural networks have become invaluable for general audio processing tasks, such as virtual analog modeling of nonlinear audio equipment. For sequence modeling tasks in particular, recurrent neural networks (RNNs) have gained widespread adoption in recent years. Their general applicability and effectiveness stems partly from their inherent nonlinearity, which makes them prone to aliasing. Recent work has explored mitigating aliasing by oversampling the network—an approach whose effectiveness is directly linked with the incurred computational costs. This work explores an alternative route by extending the antiderivative antialiasing technique to explicit, computable RNNs. Detailed applications to the Gated Recurrent Unit and Long Short-Term Memory cell are shown as case studies. The proposed technique is evaluated on multiple pre-trained guitar amplifier models, assessing its impact on the amount of aliasing and model tonality. The method is shown to reduce the models’ tendency to alias considerably across all considered sample rates while only affecting their tonality moderately, without requiring high oversampling factors. The results of this study can be used to improve sound quality in neural audio processing tasks that employ a suitable class of RNNs. Additional materials are provided in the accompanying webpage.
Download Neural Sample-Based Piano Synthesis
Piano sound emulation has been an active topic of research and development for several decades. Although comprehensive physicsbased piano models have been proposed, sample-based piano emulation is still widely utilized for its computational efficiency and relative accuracy despite presenting significant memory storage requirements. This paper proposes a novel hybrid approach to sample-based piano synthesis aimed at improving the fidelity of sound emulation while reducing memory requirements for storing samples. A neural network-based model processes the sound recorded from a single example of piano key at a given velocity. The network is trained to learn the nonlinear relationship between the various velocities at which a piano key is pressed and the corresponding sound alterations. Results show that the method achieves high accuracy using a specific neural architecture that is computationally efficient, presenting few trainable parameters, and it requires memory only for one sample for each piano key.
Download Perceptual Evaluation and Genre-specific Training of Deep Neural Network Models of a High-gain Guitar Amplifier
Modelling of analogue devices via deep neural networks (DNNs) has gained popularity recently, but their performance is usually measured using accuracy measures alone. This paper aims to assess the performance of DNN models of a high-gain vacuum-tube guitar amplifier using additional subjective measures, including preference and realism. Furthermore, the paper explores how the performance changes when genre-specific training data is used. In five listening tests, subjects rated models of a popular high-gain guitar amplifier, the Peavey 6505, in terms of preference, realism and perceptual accuracy. Two DNN models were used: a long short-term memory recurrent neural network (LSTM-RNN) and a WaveNet-based convolutional neural network (CNN). The LSTMRNN model was shown to be more accurate when trained with genre-specific data, to the extent that it could not be distinguished from the real amplifier in ABX tests. Despite minor perceptual inaccuracies, subjects found all models to be as realistic as the target in MUSHRA-like experiments, and there was no evidence to suggest that the real amplifier was preferred to any of the models in a mix. Finally, it was observed that a low-gain excerpt was more difficult to emulate, and was therefore useful to reveal differences between the models.
Download Learning Nonlinear Dynamics in Physical Modelling Synthesis Using Neural Ordinary Differential Equations
Modal synthesis methods are a long-standing approach for modelling distributed musical systems. In some cases extensions are possible in order to handle geometric nonlinearities. One such case is the high-amplitude vibration of a string, where geometric nonlinear effects lead to perceptually important effects including pitch glides and a dependence of brightness on striking amplitude. A modal decomposition leads to a coupled nonlinear system of ordinary differential equations. Recent work in applied machine learning approaches (in particular neural ordinary differential equations) has been used to model lumped dynamic systems such as electronic circuits automatically from data. In this work, we examine how modal decomposition can be combined with neural ordinary differential equations for modelling distributed musical systems. The proposed model leverages the analytical solution for linear vibration of system’s modes and employs a neural network to account for nonlinear dynamic behaviour. Physical parameters of a system remain easily accessible after the training without the need for a parameter encoder in the network architecture. As an initial proof of concept, we generate synthetic data for a nonlinear transverse string and show that the model can be trained to reproduce the nonlinear dynamics of the system. Sound examples are presented.
Download A Cosine-Distance Based Neural Network for Music Artist Recognition Using Raw I-Vector Feature
Recently, i-vector features have entered the field of Music Information Retrieval (MIR), exhibiting highly promising performance in important tasks such as music artist recognition or music similarity estimation. The i-vector modelling approach relies on a complex processing chain that limits by the use of engineered features such as MFCCs. The goal of the present paper is to make an important step towards a truly end-to-end modelling system inspired by the i-vector pipeline, to exploit the power of Deep Neural Networks1 (DNNs) to learn optimized feature spaces and transformations. Several authors have already tried to combine the power of DNNs with i-vector features, where DNNs were used for feature extraction, scoring or classification. In this paper, we try to use neural networks for the important step of i-vector post-processing and classification for the task of music artist recognition. Specifically, we propose a novel neural network for i-vector features with a cosine-distance loss function, optimized with stochastic gradient decent (SGD). We first show that current networks do not perform well with unprocessed i-vector features, and that post-processing methods such as Within-Class Covariance Normalization (WCCN) and Linear Discriminant Analysis (LDA) are crucially important to improve the i-vector representation. We further demonstrate that these linear projections (WCCN and LDA) can not be learned using general objective functions usually used in neural networks. We examine our network on a 50-class music artist recognition dataset using i-vectors extracted from frame-level timbre features. Our experiments suggest that using our network with fully unprocessed i-vectors, we can achieve the performance of the i-vector pipeline which uses i-vector post processing methods such as LDA and WCCN.