Download Real-time visualisation of the musical timbre based on the spectral estimates of the Snail-Analyser This article presents a real-time software solution that allows musicians to visualise the timbre content of their musical tones. The timbre representation is based on the spectral estimates of the SnailAnalyser, for a high frequency precision, and on a harmonic-like representation. After a brief review on the derivation of these estimates, some second-stage estimates and the mapping used for the timbre representation are described. The visual representations in the application have been prototyped using the MAX software and developed with the Juce framework.
Download An Aliasing-Free Hybrid Digital-Analog Polyphonic Synthesizer Analog subtractive synthesizers are generally considered to provide superior sound quality compared to digital emulations. However, analog circuitry requires calibration and suffers from aging, temperature instability, and limited flexibility in generating a wide variety of waveforms. Digital synthesis can mitigate many of these drawbacks, but generating arbitrary aliasing-free waveforms remains challenging. In this paper, we present the + −synth, a hybrid digital-analog eight-voice polyphonic synthesizer prototype that combines the best of both worlds. At the heart of the synthesizer is the big Fourier oscillator (BFO), a novel digital very-large scale integration (VLSI) design that utilizes additive synthesis to generate a wide variety of aliasing-free waveforms. Each BFO produces two voices, using four oscillators per voice. A single oscillator can generate up to 1024 freely configurable partials (harmonic or inharmonic), which are calculated using coordinate rotation digital computers (CORDICs). The BFOs were fabricated as 65 nm CMOS custom application-specific integrated circuits (ASICs), which are integrated in the + −synth to simultaneously generate up to 32 768 partials. Four 24-bit 96 kHz stereo DACs then convert the eight voices into the analog domain, followed by digitally controlled analog low-pass filtering and amplification. Measurement results of the + −synth prototype demonstrate high fidelity and low latency.
Download Efficient simulation of the yaybahar using a modal approach This work presents a physical model of the yaybahar, a recently invented acoustic instrument. Here, output from a bowed string is passed through a long spring, before being amplified and propagated in air via a membrane. The highly dispersive character of the spring is responsible for the typical synthetic tonal quality of this instrument. Building on previous literature, this work presents a modal discretisation of the full system, with fine control over frequency-dependent decay times, modal amplitudes and frequencies, all essential for an accurate simulation of the dispersive characteristics of reverberation. The string-bow-bridge system is also solved in the modal domain, using recently developed noniterative numerical methods allowing for efficient simulation.
Download Differentiable grey-box modelling of phaser effects using frame-based spectral processing Machine learning approaches to modelling analog audio effects have seen intensive investigation in recent years, particularly in the context of non-linear time-invariant effects such as guitar amplifiers. For modulation effects such as phasers, however, new challenges emerge due to the presence of the low-frequency oscillator which controls the slowly time-varying nature of the effect. Existing approaches have either required foreknowledge of this control signal, or have been non-causal in implementation. This work presents a differentiable digital signal processing approach to modelling phaser effects in which the underlying control signal and time-varying spectral response of the effect are jointly learned. The proposed model processes audio in short frames to implement a time-varying filter in the frequency domain, with a transfer function based on typical analog phaser circuit topology. We show that the model can be trained to emulate an analog reference device, while retaining interpretable and adjustable parameters. The frame duration is an important hyper-parameter of the proposed model, so an investigation was carried out into its effect on model accuracy. The optimal frame length depends on both the rate and transient decay-time of the target effect, but the frame length can be altered at inference time without a significant change in accuracy.
Download An active learning procedure for the interaural time difference discrimination threshold Measuring the auditory lateralization elicited by interaural time difference (ITD) cues involves the estimation of a psychometric function (PF). The shape of this function usually follows from the analysis of the subjective data and models the probability of correctly localizing the angular position of a sound source. The present study describes and evaluates a procedure for progressively fitting a PF, using Gaussian process classification of the subjective responses produced during a binary decision experiment. The process refines adaptively an approximated PF, following Bayesian inference. At each trial, it suggests the most informative auditory stimulus for function refinement according to Bayesian active learning by disagreement (BALD) mutual information. In this paper, the procedure was modified to accommodate two-alternative forced choice (2AFC) experimental methods and then was compared with a standard adaptive “three-down, one-up” staircase procedure. Our process approximates the average threshold ITD 79.4% correct level of lateralization with a mean accuracy increase of 8.9% over the Weibull function fitted on the data of the same test. The final accuracy for the Just Noticeable Difference (JND) in ITD is achieved with only 37.6% of the trials needed by a standard lateralization test.
Download Perceptual Evaluation and Genre-specific Training of Deep Neural Network Models of a High-gain Guitar Amplifier Modelling of analogue devices via deep neural networks (DNNs) has gained popularity recently, but their performance is usually measured using accuracy measures alone. This paper aims to assess the performance of DNN models of a high-gain vacuum-tube guitar amplifier using additional subjective measures, including preference and realism. Furthermore, the paper explores how the performance changes when genre-specific training data is used. In five listening tests, subjects rated models of a popular high-gain guitar amplifier, the Peavey 6505, in terms of preference, realism and perceptual accuracy. Two DNN models were used: a long short-term memory recurrent neural network (LSTM-RNN) and a WaveNet-based convolutional neural network (CNN). The LSTMRNN model was shown to be more accurate when trained with genre-specific data, to the extent that it could not be distinguished from the real amplifier in ABX tests. Despite minor perceptual inaccuracies, subjects found all models to be as realistic as the target in MUSHRA-like experiments, and there was no evidence to suggest that the real amplifier was preferred to any of the models in a mix. Finally, it was observed that a low-gain excerpt was more difficult to emulate, and was therefore useful to reveal differences between the models.
Download A Coupled Resonant Filter Bank for the Sound Synthesis of Nonlinear Sources This paper is concerned with the design of efficient and controllable filters for sound synthesis purposes, in the context of the generation of sounds radiated by nonlinear sources. These filters are coupled and generate tonal components in an interdependent way, and are intended to emulate realistic perceptually salient effects in musical instruments in an efficient manner. Control of energy transfer between the filters is realized by defining a matrix containing the coupling terms. The generation of prototypical sounds corresponding to nonlinear sources with the filter bank is presented. In particular, examples are proposed to generate sounds corresponding to impacts on thin structures and to the perturbation of the vibration of objects when it collides with an other object. The different sound examples presented in the paper and available for listening on the accompanying site tend to show that a simple control of the input parameters allows to generate sounds whose evocation is coherent, and that the addition of random processes allows to significantly improve the realism of the generated sounds.
Download Optimization techniques for a physical model of human vocalisation We present a non-supervised approach to optimize and evaluate the synthesis of non-speech audio effects from a speech production model. We use the Pink Trombone synthesizer as a case study of a simplified production model of the vocal tract to target nonspeech human audio signals –yawnings. We selected and optimized the control parameters of the synthesizer to minimize the difference between real and generated audio. We validated the most common optimization techniques reported in the literature and a specifically designed neural network. We evaluated several popular quality metrics as error functions. These include both objective quality metrics and subjective-equivalent metrics. We compared the results in terms of total error and computational demand. Results show that genetic and swarm optimizers outperform least squares algorithms at the cost of executing slower and that specific combinations of optimizers and audio representations offer significantly different results. The proposed methodology could be used in benchmarking other physical models and audio types.
Download Upcylcing Android Phones into Embedded Audio Platforms There are millions of sophisticated Android phones in the world that get disposed of at a very high rate due to consumerism. Their computational power and built-in features, instead of being wasted when discarded, could be repurposed for creative applications such as musical instruments and interactive audio installations. However, audio programming on Android is complicated and comes with restrictions that heavily impact performance. To address this issue, we present LDSP, an open-source environment that can be used to easily upcycle Android phones into embedded platforms optimized for audio synthesis and processing. We conducted a benchmark study to compare the number of oscillators that can be run in parallel on LDSP with an equivalent audio app designed according to modern Android standards. Our study tested six phones ranging from 2014 to 2018 and running different Android versions. The results consistently demonstrate that LDSP provides a significant boost in performance, with some cases showing an increase of more than double, making even very old phones suitable for fairly advanced audio applications.
Download Efficient finite-difference room acoustics simulation incorporating extended-reacting elements A method is proposed that allows finite-difference (FD) simulation of room acoustics to incorporate extended-reacting porous elements without adding major computational cost. The porous elements are described by a rigid-frame equivalent fluid model and are incorporated into the time-domain formulation through auxiliary differential equations. By using a local staggered grid scheme for the boundaries of the porous elements, the method allows an efficient second-order scalar approach to be used for the uniform air and porous element interior regions that make up the majority of the computational domain. Both the scalar and staggered schemes are based on a face-centered cubic grid to minimize numerical dispersion. A software implementation running on GPU shows the accuracy of the method compared to a theoretical reference, and demonstrates the method’s computational efficiency through a benchmark example.