Download A Generative Model for Raw Audio Using Transformer Architectures
This paper proposes a novel way of doing audio synthesis at the waveform level using Transformer architectures. We propose a deep neural network for generating waveforms, similar to wavenet . This is fully probabilistic, auto-regressive, and causal, i.e. each sample generated depends on only the previously observed samples. Our approach outperforms a widely used wavenet architecture by up to 9% on a similar dataset for predicting the next step. Using the attention mechanism, we enable the architecture to learn which audio samples are important for the prediction of the future sample. We show how causal transformer generative models can be used for raw waveform synthesis. We also show that this performance can be improved by another 2% by conditioning samples over a wider context. The flexibility of the current model to synthesize audio from latent representations suggests a large number of potential applications. The novel approach of using generative transformer architectures for raw audio synthesis is, however, still far away from generating any meaningful music similar to wavenet, without using latent codes/meta-data to aid the generation process.
Download Non-Iterative Schemes for the Simulation of Nonlinear Audio Circuits
In this work, a number of numerical schemes are presented in the context of virtual-analog simulation. The schemes are linearlyimplicit in character, and hence directly solvable without iterative methods. Schemes of increasing order of accuracy are constructed, and convergence and stability conditions are proven formally. The schemes are able to handle stiff problems very efficiently, because of their fast update, and can be run at higher sample rates to reduce aliasing. The cases of the diode clipper and ring modulator are investigated in detail, including several numerical examples.
Download Real-Time Implementation of a Friction Drum Inspired Instrument Using Finite Difference Schemes
Physical modelling sound synthesis is a powerful method for constructing virtual instruments aiming to mimic the sound of realworld counterparts, while allowing for the possibility of engaging with these instruments in ways which may be impossible in person. Such a case is explored in this paper: particularly the simulation of a friction drum inspired instrument. It is an instrument played by causing the membrane of a drum head to vibrate via friction. This involves rubbing the membrane via a stick or a cord attached to its center, with the induced vibrations being transferred to the air inside a sound box. This paper describes the development of a real-time audio application which models such an instrument as a bowed membrane connected to an acoustic tube. This is done by means of a numerical simulation using finite-difference time-domain (FDTD) methods in which the excitation, whose position is free to change in real-time, is modelled by a highly non-linear elasto-plastic friction model. Additionally, the virtual instrument allows for dynamically modifying physical parameters of the model, thereby allowing the user to generate new and interesting sounds that go beyond a realworld friction drum.
Download Applications of Port Hamiltonian Methods to Non-Iterative Stable Simulations of the Korg35 and Moog 4-Pole Vcf
This paper presents an application of the port Hamiltonian formalism to the nonlinear simulation of the OTA-based Korg35 filter circuit and the Moog 4-pole ladder filter circuit. Lyapunov analysis is used with their state-space representations to guarantee zero-input stability over the range of parameters consistent with the actual circuits. A zero-input stable non-iterative discrete-time scheme based on a discrete gradient and a change of state variables is shown along with numerical simulations. Simulations show behavior consistent with the actual operation of the circuits, e.g., self-oscillation, and are found to be stable and have lower computational cost compared to iterative methods.
Download Adaptive Pitch-Shifting With Applications to Intonation Adjustment in a Cappella Recordings
A central challenge for a cappella singers is to adjust their intonation and to stay in tune relative to their fellow singers. During editing of a cappella recordings, one may want to adjust local intonation of individual singers or account for global intonation drifts over time. This requires applying a time-varying pitch-shift to the audio recording, which we refer to as adaptive pitch-shifting. In this context, existing (semi-)automatic approaches are either laborintensive or face technical and musical limitations. In this work, we present automatic methods and tools for adaptive pitch-shifting with applications to intonation adjustment in a cappella recordings. To this end, we show how to incorporate time-varying information into existing pitch-shifting algorithms that are based on resampling and time-scale modification (TSM). Furthermore, we release an open-source Python toolbox, which includes a variety of TSM algorithms and an implementation of our method. Finally, we show the potential of our tools by two case studies on global and local intonation adjustment in a cappella recordings using a publicly available multitrack dataset of amateur choral singing.
Download One-to-Many Conversion for Percussive Samples
A filtering algorithm for generating subtle random variations in sampled sounds is proposed. Using only one recording for impact sound effects or drum machine sounds results in unrealistic repetitiveness during consecutive playback. This paper studies spectral variations in repeated knocking sounds and in three drum sounds: a hihat, a snare, and a tomtom. The proposed method uses a short pseudo-random velvet-noise filter and a low-shelf filter to produce timbral variations targeted at appropriate spectral regions, yielding potentially an endless number of new realistic versions of a single percussive sampled sound. The realism of the resulting processed sounds is studied in a listening test. The results show that the sound quality obtained with the proposed algorithm is at least as good as that of a previous method while using 77% fewer computational operations. The algorithm is widely applicable to computer-generated music and game audio.
Download Spherical Decomposition of Arbitrary Scattering Geometries for Virtual Acoustic Environments
A method is proposed to encode the acoustic scattering of objects for virtual acoustic applications through a multiple-input and multiple-output framework. The scattering is encoded as a matrix in the spherical harmonic domain, and can be re-used and manipulated (rotated, scaled and translated) to synthesize various sound scenes. The proposed method is applied and validated using Boundary Element Method simulations which shows accurate results between references and synthesis. The method is compatible with existing frameworks such as Ambisonics and image source methods.
Download Combining Zeroth and First-Order Analysis With Lagrange Polynomials to Reduce Artefacts in Live Concatenative Granulation
This paper presents a technique addressing signal discontinuity and concatenation artefacts in real-time granular processing with rectangular windowing. By combining zero-crossing synchronicity, first-order derivative analysis, and Lagrange polynomials, we can generate streams of uncorrelated and non-overlapping sonic fragments with minimal low-order derivatives discontinuities. The resulting open-source algorithm, implemented in the Faust language, provides a versatile real-time software for dynamical looping, wavetable oscillation, and granulation with reduced artefacts due to rectangular windowing and no artefacts from overlap-add-to-one techniques commonly deployed in granular processing.
Download A Physical Model of the Trombone Using Dynamic Grids for Finite-Difference Schemes
In this paper, a complete simulation of a trombone using finitedifference time-domain (FDTD) methods is proposed. In particular, we propose the use of a novel method to dynamically vary the number of grid points associated to the FDTD method, to simulate the fact that the physical dimension of the trombone’s resonator dynamically varies over time. We describe the different elements of the model and present the results of a real-time simulation.
Download Bio-Inspired Optimization of Parametric Onset Detectors
Onset detectors are used to recognize the beginning of musical events in audio signals. Manual parameter tuning for onset detectors is a time consuming task, while existing automated approaches often maximize only a single performance metric. These automated approaches cannot be used to optimize detector algorithms for complex scenarios, such as real-time onset detection where an optimization process must consider both detection accuracy and latency. For this reason, a flexible optimization algorithm should account for more than one performance metric in a multiobjective manner. This paper presents a generalized procedure for automated optimization of parametric onset detectors. Our procedure employs a bio-inspired evolutionary computation algorithm to replace manual parameter tuning, followed by the computation of the Pareto frontier for multi-objective optimization. The proposed approach was evaluated on all the onset detection methods of the Aubio library, using a dataset of monophonic acoustic guitar recordings. Results show that the proposed solution is effective in reducing the human effort required in the optimization process: it replaced more than two days of manual parameter tuning with 13 hours and 34 minutes of automated computation. Moreover, the resulting performance was comparable to that obtained by manual optimization.