Download Automatic Recognition of Cascaded Guitar Effects
This paper reports on a new multi-label classification task for guitar effect recognition that is closer to the actual use case of guitar effect pedals. To generate the dataset, we used multiple clean guitar audio datasets and applied various combinations of 13 commonly used guitar effects. We compared four neural network structures: a simple Multi-Layer Perceptron as a baseline, ResNet models, a CRNN model, and a sample-level CNN model. The ResNet models achieved the best performance in terms of accuracy and robustness under various setups (with or without clean audio, seen or unseen dataset), with a micro F1 of 0.876 and Macro F1 of 0.906 in the hardest setup. An ablation study on the ResNet models further indicates the necessary model complexity for the task.
Download Nonlinear Strings based on Masses and Springs
Due to advances in computational power, physical modelling for sound synthesis has gained an increased popularity over the past decades. Although much work has been done to accurately simulate existing physical systems, much less work exists on the use of physical modelling simply for the sake of creating sonically interesting sounds. This work presents a mass-spring network, inspired by existing models of the physical string. Masses have 2 translational degrees of freedom (DoF), and the springs have an additional equilibrium separation term, which together result in highly nonlinear effects. The main aim of this work is to create sonically interesting sounds while retaining some of the natural qualities of the physical string, as opposed to accurately simulating it. Although the implementation exhibits chaotic behaviour for certain choices of parameters, the presented system can create sonically interesting timbres, including nonlinear pitch glides and ‘wobbles’.
Download Differentiable All-Pass Filters for Phase Response Estimation and Automatic Signal Alignment
Virtual analog (VA) audio effects are increasingly based on neural networks and deep learning frameworks. Due to the underlying black-box methodology, a successful model will learn to approximate the data it is presented, including potential errors such as latency and audio dropouts as well as non-linear characteristics and frequency-dependent phase shifts produced by the hardware. The latter is of particular interest as the learned phase-response might cause unwanted audible artifacts when the effect is used for creative processing techniques such as dry-wet mixing or parallel compression. To overcome these artifacts we propose differentiable signal processing tools and deep optimization structures for automatically tuning all-pass filters to predict the phase response of different VA simulations, and align processed signals that are out of phase. The approaches are assessed using objective metrics while listening tests evaluate their ability to enhance the quality of parallel path processing techniques. Ultimately, an overparameterized, BiasNet-based, all-pass model is proposed for the optimization problem under consideration, resulting in models that can estimate all-pass filter coefficients to align a dry signal with its affected, wet, equivalent.
Download Power-Balanced Dynamic Modeling of Vactrols: Application to a VTL5C3/2
Vactrols, which consist of a photoresistor and a light-emitting element that are optically coupled, are key components in optical dynamic compressors. Indeed, the photoresistor’s program-dependent dynamic characteristics make it advantageous for automatic gain control in audio applications. Vactrols are becoming more and more difficult to find, while the interest for optical compression in the audio community does not diminish. They are thus good candidates for virtual analog modeling. In this paper, a model of vactrols that is entirely physical, passive, with a program-dependent dynamic behavior, is proposed. The model is based on first principles that govern semi-conductors, as well as the port-Hamiltonian systems formalism, which allows the modeling of nonlinear, multiphysical behaviors. The proposed model is identified with a real vactrol, then connected to other components in order to simulate a simple optical compressor.
Download Antialiased State Trajectory Neural Networks for Virtual Analog Modeling
In recent years, virtual analog modeling with neural networks experienced an increase in interest and popularity. Many different modeling approaches have been developed and successfully applied. In this paper we do not propose a novel model architecture, but rather address the problem of aliasing distortion introduced from nonlinearities of the modeled analog circuit. In particular, we propose to apply the general idea of antiderivative antialiasing to a state-trajectory network (STN). Applying antiderivative antialiasing to a stateful system in general leads to an integral of a multivariate function that can only be solved numerically, which is too costly for real-time application. However, an adapted STN can be trained to approximate the solution while being computationally efficient. It is shown that this approach can decrease aliasing distortion in the audioband significantly while only moderately oversampling the network in training and inference.
Download A Virtual Instrument for Ifft-Based Additive Synthesis in the Ambisonics Domain
Spatial additive synthesis can be efficiently implemented by applying the inverse Fourier transform to create the individual channels of Ambisonics signals. In the presented work, this approach has been implemented as an audio plugin, allowing the generation and control of basic waveforms and their spatial attributes in a typical DAW-based music production context. Triggered envelopes and low frequency oscillators can be mapped to the spectral shape, source position and source width of the resulting sounds. A technical evaluation shows the computational advantages of the proposed method for additive sounds with high numbers of partials and different Ambisonics orders. The results of a user study indicate the potential of the developed plugin for manipulating the perceived position, source width and timbre coloration.
Download A Coupled Resonant Filter Bank for the Sound Synthesis of Nonlinear Sources
This paper is concerned with the design of efficient and controllable filters for sound synthesis purposes, in the context of the generation of sounds radiated by nonlinear sources. These filters are coupled and generate tonal components in an interdependent way, and are intended to emulate realistic perceptually salient effects in musical instruments in an efficient manner. Control of energy transfer between the filters is realized by defining a matrix containing the coupling terms. The generation of prototypical sounds corresponding to nonlinear sources with the filter bank is presented. In particular, examples are proposed to generate sounds corresponding to impacts on thin structures and to the perturbation of the vibration of objects when it collides with an other object. The different sound examples presented in the paper and available for listening on the accompanying site tend to show that a simple control of the input parameters allows to generate sounds whose evocation is coherent, and that the addition of random processes allows to significantly improve the realism of the generated sounds.
Download Dynamic Pitch Warping for Expressive Vocal Retuning
This work introduces the use of the Dynamic Pitch Warping (DPW) method for automatic pitch correction of singing voice audio signals. DPW is designed to dynamically tune any pitch trajectory to a predefined scale while preserving its expressive ornamentation. DPW has three degrees of freedom to modify the fundamental frequency (f0 ) signal: detection interval, critical time, and transition time. Together, these parameters allow us to define a pitch velocity condition that triggers an adaptive correction of the pitch trajectory (pitch warping). We compared our approach to Antares Autotune (the most commonly used software brand, abbreviated as ATA in this article). The pitch correction in ATA has two degrees of freedom: a triggering threshold (flextune) and the transition time (retune speed). The pitch trajectories that we compare were extracted from autotuned-in-ATA audio signals, and the DPW algorithm implemented over the f0 of the input audio tracks. We studied specifically pitch correction for three typical situations of f0 curves: staircase, vibrato, free-path. We measured the proximity of the corrected pitch trajectories to the original ones for each case obtaining that the DPW pitch correction method is better to preserve vibrato while keeping the f0 free path. In contrast, ATA is more effective in generating staircase curves, but fails for notsmall vibratos and free-path curves. We have also implemented an off-line automatic picth tuner using DPW.
Download Tunable Collisions: Hammer-String Simulation with Time-Variant Parameters
In physical modelling synthesis, articulation and tuning are effected via time-variation in one or more parameters. Adopting hammered strings as a test case, this paper develops extended forms of such control, proposing a numerical formulation that affords online adjustment of each of its scaled-form parameters, including those featuring in the one-sided power law for modelling hammerstring collisions. Starting from a modally-expanded representation of the string, an explicit scheme is constructed based on quadratising the contact energy. Compared to the case of time-invariant contact parameters, updating the scheme’s state variables relies on the evaluation of two additional analytic partial derivatives of the auxiliary variable. A numerical energy balance is derived and the numerical contact force is shown to be strictly non-adhesive. Example results with time-variant tension and time-variant contact stiffness are detailed, and real-time viability is demonstrated.
Download Real-time Gong Synthesis
Physical modeling sound synthesis is notoriously computationally intensive. But recent advances in algorithm efficiency, accompanied by increases in available computing power have brought real-time performance within range for a variety of complex physical models. In this paper, the case of nonlinear plate vibration, used as a simple model for the synthesis of sounds from gongs is considered. Such a model, derived from that of Föppl and von Kármán, includes a strong geometric nonlinearity, leading to a variety of perceptually-salient effects, including pitch glides and crashes. Also discussed here are input excitation and scanned multichannel output. A numerical scheme is presented that mirrors the energetic and dissipative properties of a continuous model, allowing for control over numerical stability. Furthermore, the nonlinearity in the scheme can be solved explicitly, allowing for an efficient solution in real time. The solution relies on a quadratised expression for numerical energy, and is in line with recent work on invariant energy quadratisation and scalar auxiliary variable approaches to simulation. Implementation details, including appropriate perceptuallyrelevant choices for parameter settings are discussed. Numerical examples are presented, alongside timing results illustrating realtime performance on a typical CPU.