Download Exposure Bias and State Matching in Recurrent Neural Network Virtual Analog Models Virtual analog (VA) modeling using neural networks (NNs) has
great potential for rapidly producing high-fidelity models. Recurrent neural networks (RNNs) are especially appealing for VA due
to their connection with discrete nodal analysis. Furthermore, VA
models based on NNs can be trained efficiently by directly exposing them to the circuit states in a gray-box fashion. However,
exposure to ground truth information during training can leave the
models susceptible to error accumulation in a free-running mode,
also known as “exposure bias” in machine learning literature. This
paper presents a unified framework for treating the previously
proposed state trajectory network (STN) and gated recurrent unit
(GRU) networks as special cases of discrete nodal analysis. We
propose a novel circuit state-matching mechanism for the GRU
and experimentally compare the previously mentioned networks
for their performance in state matching, during training, and in exposure bias, during inference. Experimental results from modeling
a diode clipper show that all the tested models exhibit some exposure bias, which can be mitigated by truncated backpropagation
through time. Furthermore, the proposed state matching mechanism improves the GRU modeling performance of an overdrive
pedal and a phaser pedal, especially in the presence of external
modulation, apparent in a phaser circuit.
Download Air Absorption Filtering Method Based on Approximate Green's Function for Stokes' Equation Air absorption effects lead to significant attenuation in high frequencies over long distances and this is critical to model in wide-band
virtual acoustic simulations. Air absorption is commonly modelled
using filter banks applied to an impulse response or to individual
impulse events (rays or image sources) arriving at a receiver. Such
filter banks require non-trivial fitting to air absorption attenuation
curves, as a function of time or distance, in the case of IIR approximations, or may suffer from overlap-add artefacts in the case of FIR
approximations. In this study, a filter method is presented which
avoids the aforementioned issues. The proposed approach relies on a
time-varying diffusion kernel that is found in an approximate Green’s
function solution to Stokes’ equation in free space. This kernel acts
as a low-pass filter that is parametrised by physical constants, and can
be applied to an impulse response using time-varying convolution.
Numerical examples are presented demonstrating the utility of this
approach for adding air absorption effects to room impulse responses
simulated using geometrical acoustics or wave-based methods.
Download On the Equivalence of Integrator- and Differentiator-Based Continuous- and Discrete-Time Systems The article performs a generic comparison of integrator- and differentiator based continuous-time systems as well as their discretetime models, aiming to answer the reoccurring question in the
music DSP community of whether there are any benefits in using differentiators instead of conventionally employed integrators.
It is found that both kinds of models are practically equivalent, but
there are certain reservations about differentiator based models.
Download Parametric Spatial Audio Effects Based on the Multi-Directional Decomposition of Ambisonic Sound Scenes Decomposing a sound-field into its individual components and respective parameters can represent a convenient first-step towards
offering the user an intuitive means of controlling spatial audio
effects and sound-field modification tools. The majority of such
tools available today, however, are instead limited to linear combinations of signals or employ a basic single-source parametric
model. Therefore, the purpose of this paper is to present a parametric framework, which seeks to overcome these limitations by first
dividing the sound-field into its multi-source and ambient components based on estimated spatial parameters. It is then demonstrated that by manipulating the spatial parameters prior to reproducing the scene, a number of sound-field modification and spatial
audio effects may be realised; including: directional warping, listener translation, sound source tracking, spatial editing workflows
and spatial side-chaining. Many of the effects described have also
been implemented as real-time audio plug-ins, in order to demonstrate how a user may interact with such tools in practice.
Download Quality Diversity for Synthesizer Sound Matching It is difficult to adjust the parameters of a complex synthesizer to
create the desired sound. As such, sound matching, the estimation of synthesis parameters that can replicate a certain sound, is
a task that has often been researched, utilizing optimization methods such as genetic algorithm (GA). In this paper, we introduce a
novelty-based objective for GA-based sound matching. Our contribution is two-fold. First, we show that the novelty objective is
able to improve the quality of sound matching by maintaining phenotypic diversity in the population. Second, we introduce a quality diversity approach to the problem of sound matching, aiming
to find a diverse set of matching sounds. We show that the novelty objective is effective in producing high-performing solutions
that are diverse in terms of specified audio features. This approach
allows for a new way of discovering sounds and exploring the capabilities of a synthesizer.
Download An Audio-Visual Fusion Piano Transcription Approach Based on Strategy Piano transcription is a fundamental problem in the field of music
information retrieval. At present, a large number of transcriptional
studies are mainly based on audio or video, yet there is a small
number of discussion based on audio-visual fusion. In this paper,
a piano transcription model based on strategy fusion is proposed,
in which the transcription results of the video model are used to assist audio transcription. Due to the lack of datasets currently used
for audio-visual fusion, the OMAPS data set is proposed in this paper. Meanwhile, our strategy fusion model achieves a 92.07% F1
score on OMAPS dataset. The transcription model based on feature fusion is also compared with the one based on strategy fusion.
The experiment results show that the transcription model based on
strategy fusion achieves better results than the one based on feature
fusion.
Download Realistic Gramophone Noise Synthesis Using a Diffusion Model This paper introduces a novel data-driven strategy for synthesizing gramophone noise audio textures. A diffusion probabilistic model is applied to generate highly realistic quasiperiodic noises. The proposed model is designed to generate samples of length equal to one disk revolution, but a method to generate plausible periodic variations between revolutions is also proposed. A guided approach is also applied as a conditioning method, where an audio signal generated with manually-tuned signal processing is refined via reverse diffusion to improve realism. The method has been evaluated in a subjective listening test, in which the participants were often unable to recognize the synthesized signals from the real ones. The synthetic noises produced with the best proposed unconditional method are statistically indistinguishable from real noise recordings. This work shows the potential of diffusion models for highly realistic audio synthesis tasks.
Download Optimal Integer Order Approximation of Fractional Order Filters Fractional order filters have been studied since a long time,
along with their applications to many areas of physics and engineering. In particular, several solutions have been proposed in
order to approximate their frequency response with that of an ordinary filter. In this paper, we tackle this problem with a new approach: we solve analytically a simplified version of the problem
and we find the optimal placement of poles and zeros, giving a
mathematical proof and an error estimate. This solution shows improved performance compared to the current state of the art and is
suitable for real-time parametric control.
Download Simulating a Hexaphonic Pickup Using Parallel Comb Filters for Guitar Distortion This paper introduces hexaphonic distortion as a way of achieving
harmonically rich guitar distortion while minimizing intermodulation products regardless of playing style. The simulated hexaphonic distortion effect described in this paper attempts to reproduce the characteristics of hexaphonic distortion for use with ordinary electric guitars with mono pickups. The proposed approach
uses a parallel comb filter structure that separates a mono guitar
signal into its harmonic components. This simulates the six individual string signals obtained from a hexaphonic pickup. Each of
the signals are then individually distorted with oversampling used
to avoid aliasing artifacts. Starting with the baseline of the distorted mono signal, the simulated distortion produces fewer intermodulation products with a result approaching that of hexaphonic
distortion.
Download Higher-Order Anti-Derivatives of Band Limited Step Functions for the Design of Radial Filters in Spherical Harmonics Expansions This paper presents a discrete-time model of the spherical harmonics expansion describing a sound field. The so-called radial functions are realized as digital filters, which characterize the spatial
impulse responses of the individual harmonic orders. The filter
coefficients are derived from the analytical expressions of the timedomain radial functions, which have a finite extent in time. Due
to the varying degrees of discontinuities occurring at their edges, a
time-domain sampling of the radial functions gives rise to aliasing.
In order to reduce the aliasing distortion, the discontinuities are replaced with the higher-order anti-derivatives of a band-limited step
function. The improved spectral accuracy is demonstrated by numerical evaluation. The proposed discrete-time sound field model
is applicable in broadband applications such as spatial sound reproduction and active noise control.