Download Exposure Bias and State Matching in Recurrent Neural Network Virtual Analog Models
Virtual analog (VA) modeling using neural networks (NNs) has great potential for rapidly producing high-fidelity models. Recurrent neural networks (RNNs) are especially appealing for VA due to their connection with discrete nodal analysis. Furthermore, VA models based on NNs can be trained efficiently by directly exposing them to the circuit states in a gray-box fashion. However, exposure to ground truth information during training can leave the models susceptible to error accumulation in a free-running mode, also known as “exposure bias” in machine learning literature. This paper presents a unified framework for treating the previously proposed state trajectory network (STN) and gated recurrent unit (GRU) networks as special cases of discrete nodal analysis. We propose a novel circuit state-matching mechanism for the GRU and experimentally compare the previously mentioned networks for their performance in state matching, during training, and in exposure bias, during inference. Experimental results from modeling a diode clipper show that all the tested models exhibit some exposure bias, which can be mitigated by truncated backpropagation through time. Furthermore, the proposed state matching mechanism improves the GRU modeling performance of an overdrive pedal and a phaser pedal, especially in the presence of external modulation, apparent in a phaser circuit.
Download Air Absorption Filtering Method Based on Approximate Green's Function for Stokes' Equation
Air absorption effects lead to significant attenuation in high frequencies over long distances and this is critical to model in wide-band virtual acoustic simulations. Air absorption is commonly modelled using filter banks applied to an impulse response or to individual impulse events (rays or image sources) arriving at a receiver. Such filter banks require non-trivial fitting to air absorption attenuation curves, as a function of time or distance, in the case of IIR approximations, or may suffer from overlap-add artefacts in the case of FIR approximations. In this study, a filter method is presented which avoids the aforementioned issues. The proposed approach relies on a time-varying diffusion kernel that is found in an approximate Green’s function solution to Stokes’ equation in free space. This kernel acts as a low-pass filter that is parametrised by physical constants, and can be applied to an impulse response using time-varying convolution. Numerical examples are presented demonstrating the utility of this approach for adding air absorption effects to room impulse responses simulated using geometrical acoustics or wave-based methods.
Download On the Equivalence of Integrator- and Differentiator-Based Continuous- and Discrete-Time Systems
The article performs a generic comparison of integrator- and differentiator based continuous-time systems as well as their discretetime models, aiming to answer the reoccurring question in the music DSP community of whether there are any benefits in using differentiators instead of conventionally employed integrators. It is found that both kinds of models are practically equivalent, but there are certain reservations about differentiator based models.
Download Parametric Spatial Audio Effects Based on the Multi-Directional Decomposition of Ambisonic Sound Scenes
Decomposing a sound-field into its individual components and respective parameters can represent a convenient first-step towards offering the user an intuitive means of controlling spatial audio effects and sound-field modification tools. The majority of such tools available today, however, are instead limited to linear combinations of signals or employ a basic single-source parametric model. Therefore, the purpose of this paper is to present a parametric framework, which seeks to overcome these limitations by first dividing the sound-field into its multi-source and ambient components based on estimated spatial parameters. It is then demonstrated that by manipulating the spatial parameters prior to reproducing the scene, a number of sound-field modification and spatial audio effects may be realised; including: directional warping, listener translation, sound source tracking, spatial editing workflows and spatial side-chaining. Many of the effects described have also been implemented as real-time audio plug-ins, in order to demonstrate how a user may interact with such tools in practice.
Download Quality Diversity for Synthesizer Sound Matching
It is difficult to adjust the parameters of a complex synthesizer to create the desired sound. As such, sound matching, the estimation of synthesis parameters that can replicate a certain sound, is a task that has often been researched, utilizing optimization methods such as genetic algorithm (GA). In this paper, we introduce a novelty-based objective for GA-based sound matching. Our contribution is two-fold. First, we show that the novelty objective is able to improve the quality of sound matching by maintaining phenotypic diversity in the population. Second, we introduce a quality diversity approach to the problem of sound matching, aiming to find a diverse set of matching sounds. We show that the novelty objective is effective in producing high-performing solutions that are diverse in terms of specified audio features. This approach allows for a new way of discovering sounds and exploring the capabilities of a synthesizer.
Download An Audio-Visual Fusion Piano Transcription Approach Based on Strategy
Piano transcription is a fundamental problem in the field of music information retrieval. At present, a large number of transcriptional studies are mainly based on audio or video, yet there is a small number of discussion based on audio-visual fusion. In this paper, a piano transcription model based on strategy fusion is proposed, in which the transcription results of the video model are used to assist audio transcription. Due to the lack of datasets currently used for audio-visual fusion, the OMAPS data set is proposed in this paper. Meanwhile, our strategy fusion model achieves a 92.07% F1 score on OMAPS dataset. The transcription model based on feature fusion is also compared with the one based on strategy fusion. The experiment results show that the transcription model based on strategy fusion achieves better results than the one based on feature fusion.
Download Realistic Gramophone Noise Synthesis Using a Diffusion Model
This paper introduces a novel data-driven strategy for synthesizing gramophone noise audio textures. A diffusion probabilistic model is applied to generate highly realistic quasiperiodic noises. The proposed model is designed to generate samples of length equal to one disk revolution, but a method to generate plausible periodic variations between revolutions is also proposed. A guided approach is also applied as a conditioning method, where an audio signal generated with manually-tuned signal processing is refined via reverse diffusion to improve realism. The method has been evaluated in a subjective listening test, in which the participants were often unable to recognize the synthesized signals from the real ones. The synthetic noises produced with the best proposed unconditional method are statistically indistinguishable from real noise recordings. This work shows the potential of diffusion models for highly realistic audio synthesis tasks.
Download Optimal Integer Order Approximation of Fractional Order Filters
Fractional order filters have been studied since a long time, along with their applications to many areas of physics and engineering. In particular, several solutions have been proposed in order to approximate their frequency response with that of an ordinary filter. In this paper, we tackle this problem with a new approach: we solve analytically a simplified version of the problem and we find the optimal placement of poles and zeros, giving a mathematical proof and an error estimate. This solution shows improved performance compared to the current state of the art and is suitable for real-time parametric control.
Download Simulating a Hexaphonic Pickup Using Parallel Comb Filters for Guitar Distortion
This paper introduces hexaphonic distortion as a way of achieving harmonically rich guitar distortion while minimizing intermodulation products regardless of playing style. The simulated hexaphonic distortion effect described in this paper attempts to reproduce the characteristics of hexaphonic distortion for use with ordinary electric guitars with mono pickups. The proposed approach uses a parallel comb filter structure that separates a mono guitar signal into its harmonic components. This simulates the six individual string signals obtained from a hexaphonic pickup. Each of the signals are then individually distorted with oversampling used to avoid aliasing artifacts. Starting with the baseline of the distorted mono signal, the simulated distortion produces fewer intermodulation products with a result approaching that of hexaphonic distortion.
Download Higher-Order Anti-Derivatives of Band Limited Step Functions for the Design of Radial Filters in Spherical Harmonics Expansions
This paper presents a discrete-time model of the spherical harmonics expansion describing a sound field. The so-called radial functions are realized as digital filters, which characterize the spatial impulse responses of the individual harmonic orders. The filter coefficients are derived from the analytical expressions of the timedomain radial functions, which have a finite extent in time. Due to the varying degrees of discontinuities occurring at their edges, a time-domain sampling of the radial functions gives rise to aliasing. In order to reduce the aliasing distortion, the discontinuities are replaced with the higher-order anti-derivatives of a band-limited step function. The improved spectral accuracy is demonstrated by numerical evaluation. The proposed discrete-time sound field model is applicable in broadband applications such as spatial sound reproduction and active noise control.