Download Higher-Order Anti-Derivatives of Band Limited Step Functions for the Design of Radial Filters in Spherical Harmonics Expansions This paper presents a discrete-time model of the spherical harmonics expansion describing a sound field. The so-called radial functions are realized as digital filters, which characterize the spatial
impulse responses of the individual harmonic orders. The filter
coefficients are derived from the analytical expressions of the timedomain radial functions, which have a finite extent in time. Due
to the varying degrees of discontinuities occurring at their edges, a
time-domain sampling of the radial functions gives rise to aliasing.
In order to reduce the aliasing distortion, the discontinuities are replaced with the higher-order anti-derivatives of a band-limited step
function. The improved spectral accuracy is demonstrated by numerical evaluation. The proposed discrete-time sound field model
is applicable in broadband applications such as spatial sound reproduction and active noise control.
Download Deep Learning Conditioned Modeling of Optical Compression Deep learning models applied to raw audio are rapidly gaining relevance in modeling audio analog devices. This paper investigates the use of different deep architectures for modeling audio optical compression. The models use as input and produce as output raw audio samples at audio rate, and it works with noor small-input buffers allowing a theoretical real-time and lowlatency implementation. In this study, two compressor parameters, the ratio, and threshold have been included in the modeling process aiming to condition the inference of the trained network. Deep learning architectures are compared to model an all-tube optical mono compressor including feed-forward, recurrent, and encoder-decoder models. The results of this study show that feedforward and long short-term memory architectures present limitations in modeling the triggering phase of the compressor, performing well only on the sustained phase. On the other hand, encoderdecoder models outperform other architectures in replicating the overall compression process, but they overpredict the energy of high-frequency components.
Download Real-Time Black-Box Modelling With Recurrent Neural Networks This paper proposes to use a recurrent neural network for black-box modelling of nonlinear audio systems, such as tube amplifiers and distortion pedals. As a recurrent unit structure, we test both Long Short-Term Memory and a Gated Recurrent Unit. We compare the proposed neural network with a WaveNet-style deep neural network, which has been suggested previously for tube amplifier modelling. The neural networks are trained with several minutes of guitar and bass recordings, which have been passed through the devices to be modelled. A real-time audio plugin implementing the proposed networks has been developed in the JUCE framework. It is shown that the recurrent neural networks achieve similar accuracy to the WaveNet model, while requiring significantly less processing power to run. The Long Short-Term Memory recurrent unit is also found to outperform the Gated Recurrent Unit overall. The proposed neural network is an important step forward in computationally efficient yet accurate emulation of tube amplifiers and distortion pedals.
Download Sample Rate Independent Recurrent Neural Networks for Audio Effects Processing In recent years, machine learning approaches to modelling guitar amplifiers and effects pedals have been widely investigated and have become standard practice in some consumer products. In particular, recurrent neural networks (RNNs) are a popular choice for modelling non-linear devices such as vacuum tube amplifiers and distortion circuitry. One limitation of such models is that they are trained on audio at a specific sample rate and therefore give unreliable results when operating at another rate. Here, we investigate several methods of modifying RNN structures to make them approximately sample rate independent, with a focus on oversampling. In the case of integer oversampling, we demonstrate that a previously proposed delay-based approach provides high fidelity sample rate conversion whilst additionally reducing aliasing. For non-integer sample rate adjustment, we propose two novel methods and show that one of these, based on cubic Lagrange interpolation of a delay-line, provides a significant improvement over existing methods. To our knowledge, this work provides the first in-depth study into this problem.
Download Generative Latent Spaces for Neural Synthesis of Audio Textures This paper investigates the synthesis of audio textures and the
structure of generative latent spaces using Variational Autoencoders (VAEs) within two paradigms of neural audio synthesis:
DSP-inspired and data-driven approaches. For each paradigm, we
propose VAE-based frameworks that allow fine-grained temporal
control. We introduce datasets across three categories of environmental sounds to support our investigations. We evaluate and compare the models’ reconstruction performance using objective metrics, and investigate their generative capabilities and latent space
structure through latent space interpolations.
Download Realistic Gramophone Noise Synthesis Using a Diffusion Model This paper introduces a novel data-driven strategy for synthesizing gramophone noise audio textures. A diffusion probabilistic model is applied to generate highly realistic quasiperiodic noises. The proposed model is designed to generate samples of length equal to one disk revolution, but a method to generate plausible periodic variations between revolutions is also proposed. A guided approach is also applied as a conditioning method, where an audio signal generated with manually-tuned signal processing is refined via reverse diffusion to improve realism. The method has been evaluated in a subjective listening test, in which the participants were often unable to recognize the synthesized signals from the real ones. The synthetic noises produced with the best proposed unconditional method are statistically indistinguishable from real noise recordings. This work shows the potential of diffusion models for highly realistic audio synthesis tasks.
Download Development of an outdoor auralisation prototype with 3D sound reproduction Auralisation of outdoor sound has a strong potential for demonstrating the impact of different community noise scenarios. We describe here the development of an auralisation tool for outdoor noise such as traffic or industry. The tool calculates the sound propagation from source to listener using the Nord2000 model, and represents the sound field at the listener’s position using spherical harmonics. Because of this spherical harmonics approach, the sound may be reproduced in various formats, such as headphones, stereo, or surround. Dynamic reproduction in headphones according to the listener’s head orientation is also possible through the use of head tracking.
Download On the numerical solution of the 2D wave equation with compact FDTD schemes This paper discusses compact-stencil nite difference time domain (FDTD) schemes for approximating the 2D wave equation in the context of digital audio. Stability, accuracy, and efciency are investigated and new ways of viewing and interpreting the results are discussed. It is shown that if a tight accuracy constraint is applied, implicit schemes outperform explicit schemes. The paper also discusses the relevance to digital waveguide mesh modelling, and highlights the optimally efcient explicit scheme.
Download GPGPU Patterns for Serial and Parallel Audio Effects Modern commodity GPUs offer high numerical throughput per
unit of cost, but often sit idle during audio workstation tasks. Various researches in the field have shown that GPUs excel at tasks
such as Finite-Difference Time-Domain simulation and wavefield
synthesis. Concrete implementations of several such projects are
available for use.
Benchmarks and use cases generally concentrate on running
one project on a GPU. Running multiple such projects simultaneously is less common, and reduces throughput. In this work
we list some concerns when running multiple heterogeneous tasks
on the GPU. We apply optimization strategies detailed in developer documentation and commercial CUDA literature, and show
results through the lens of real-time audio tasks. We benchmark
the cases of (i) a homogeneous effect chain made of previously
separate effects, and (ii) a synthesizer with distinct, parallelizable
sound generators.
Download Sound Spatialisation Towards the end of the nineteenth century two inventions, the telephone and the phonograph, appeared which were to change the way music was dealt with. Prior to the developments they brought about, every musical performance was indivisible from its place in time and space. Their appearance meant that music could be presented remotely in both time and space from its origin. This has inevitably resulted in various forms of distortion - nonlinear, spectral, temporal, spatial, of the original. Whilst it has proved relatively easy to deal satisfactorily with the first three so that we can now present “remoted” music which is excellent in all of those three aspects, removing the distortions in the spatial presentation has proved far more intractable. Even the best systems in use today for sound “spatialisation” are relatively crude, allowing for little more than the creation of an illusion, sometimes very good, more often poor. However, despite this, the creative use of sound spatialisation is becoming more and more important, whether for serious avant-garde composers, computer game designers, in cinema, television and multimedia productions or in audio recording. It is anticipated that the demand will escalate even more with the appearance of DVD with its multiple audio channel capability. This tutorial paper briefly covers the basic directional hearing mechanisms of the human brain before examining in more detail the various different ways of dealing with sound spatialisation, starting with headphone related technologies such as binaural and transaural. Loudspeaker-based systems will then be covered, starting with conventional stereo followed by cinema style surround sound systems. Finally a true 3-d system, Ambisonics, will be examined. The advantages and limitations of all the systems, both aurally and in terms of difficulty of implementation or control, will be covered. It is hoped to give a number of demonstrations.