Download Real-time visualisation of the musical timbre based on the spectral estimates of the Snail-Analyser This article presents a real-time software solution that allows musicians to visualise the timbre content of their musical tones. The timbre representation is based on the spectral estimates of the SnailAnalyser, for a high frequency precision, and on a harmonic-like representation. After a brief review on the derivation of these estimates, some second-stage estimates and the mapping used for the timbre representation are described. The visual representations in the application have been prototyped using the MAX software and developed with the Juce framework.
Download A Quadric Surface Model of Vacuum Tubes for Virtual Analog Applications Despite the prevalence of modern audio technology, vacuum tube amplifiers continue to play a vital role in the music industry. For this reason, over the years, many different digital techniques have been introduced for accomplishing their emulation. In this paper, we propose a novel quadric surface model for tube simulations able to overcome the Cardarilli model in terms of efficiency whilst retaining comparable accuracy when grid current is negligible. After showing the model capability to well outline tubes starting from measurement data, we perform an efficiency comparison by implementing the considered tube models as nonlinear 3-port elements in the Wave Digital domain. We do this by taking into account the typical common-cathode gain stage employed in vacuum tube guitar amplifiers. The proposed model turns out to be characterized by a speedup of 4.6× with respect to the Cardarilli model, proving thus to be promising for real-time Virtual Analog applications.
Download Perceptual Evaluation and Genre-specific Training of Deep Neural Network Models of a High-gain Guitar Amplifier Modelling of analogue devices via deep neural networks (DNNs) has gained popularity recently, but their performance is usually measured using accuracy measures alone. This paper aims to assess the performance of DNN models of a high-gain vacuum-tube guitar amplifier using additional subjective measures, including preference and realism. Furthermore, the paper explores how the performance changes when genre-specific training data is used. In five listening tests, subjects rated models of a popular high-gain guitar amplifier, the Peavey 6505, in terms of preference, realism and perceptual accuracy. Two DNN models were used: a long short-term memory recurrent neural network (LSTM-RNN) and a WaveNet-based convolutional neural network (CNN). The LSTMRNN model was shown to be more accurate when trained with genre-specific data, to the extent that it could not be distinguished from the real amplifier in ABX tests. Despite minor perceptual inaccuracies, subjects found all models to be as realistic as the target in MUSHRA-like experiments, and there was no evidence to suggest that the real amplifier was preferred to any of the models in a mix. Finally, it was observed that a low-gain excerpt was more difficult to emulate, and was therefore useful to reveal differences between the models.
Download Efficient finite-difference room acoustics simulation incorporating extended-reacting elements A method is proposed that allows finite-difference (FD) simulation of room acoustics to incorporate extended-reacting porous elements without adding major computational cost. The porous elements are described by a rigid-frame equivalent fluid model and are incorporated into the time-domain formulation through auxiliary differential equations. By using a local staggered grid scheme for the boundaries of the porous elements, the method allows an efficient second-order scalar approach to be used for the uniform air and porous element interior regions that make up the majority of the computational domain. Both the scalar and staggered schemes are based on a face-centered cubic grid to minimize numerical dispersion. A software implementation running on GPU shows the accuracy of the method compared to a theoretical reference, and demonstrates the method’s computational efficiency through a benchmark example.
Download Differentiable grey-box modelling of phaser effects using frame-based spectral processing Machine learning approaches to modelling analog audio effects have seen intensive investigation in recent years, particularly in the context of non-linear time-invariant effects such as guitar amplifiers. For modulation effects such as phasers, however, new challenges emerge due to the presence of the low-frequency oscillator which controls the slowly time-varying nature of the effect. Existing approaches have either required foreknowledge of this control signal, or have been non-causal in implementation. This work presents a differentiable digital signal processing approach to modelling phaser effects in which the underlying control signal and time-varying spectral response of the effect are jointly learned. The proposed model processes audio in short frames to implement a time-varying filter in the frequency domain, with a transfer function based on typical analog phaser circuit topology. We show that the model can be trained to emulate an analog reference device, while retaining interpretable and adjustable parameters. The frame duration is an important hyper-parameter of the proposed model, so an investigation was carried out into its effect on model accuracy. The optimal frame length depends on both the rate and transient decay-time of the target effect, but the frame length can be altered at inference time without a significant change in accuracy.
Download Upcylcing Android Phones into Embedded Audio Platforms There are millions of sophisticated Android phones in the world that get disposed of at a very high rate due to consumerism. Their computational power and built-in features, instead of being wasted when discarded, could be repurposed for creative applications such as musical instruments and interactive audio installations. However, audio programming on Android is complicated and comes with restrictions that heavily impact performance. To address this issue, we present LDSP, an open-source environment that can be used to easily upcycle Android phones into embedded platforms optimized for audio synthesis and processing. We conducted a benchmark study to compare the number of oscillators that can be run in parallel on LDSP with an equivalent audio app designed according to modern Android standards. Our study tested six phones ranging from 2014 to 2018 and running different Android versions. The results consistently demonstrate that LDSP provides a significant boost in performance, with some cases showing an increase of more than double, making even very old phones suitable for fairly advanced audio applications.
Download Self-Supervised Disentanglement of Harmonic and Rhythmic Features in Music Audio Signals The aim of latent variable disentanglement is to infer the multiple informative latent representations that lie behind a data generation process and is a key factor in controllable data generation. In this paper, we propose a deep neural network-based self-supervised learning method to infer the disentangled rhythmic and harmonic representations behind music audio generation. We train a variational autoencoder that generates an audio mel-spectrogram from two latent features representing the rhythmic and harmonic content. In the training phase, the variational autoencoder is trained to reconstruct the input mel-spectrogram given its pitch-shifted version. At each forward computation in the training phase, a vector rotation operation is applied to one of the latent features, assuming that the dimensions of the feature vectors are related to pitch intervals. Therefore, in the trained variational autoencoder, the rotated latent feature represents the pitch-related information of the mel-spectrogram, and the unrotated latent feature represents the pitch-invariant information, i.e., the rhythmic content. The proposed method was evaluated using a predictor-based disentanglement metric on the learned features. Furthermore, we demonstrate its application to the automatic generation of music remixes.
Download Probabilistic Reverberation Model Based on Echo Density and Kurtosis This article proposes a probabilistic model for synthesizing room impulse responses (RIRs) for use in convolution artificial reverberators. The proposed method is based on the concept of echo density. Echo density is a measure of the number of echoes per second in an impulse response and is a demonstrated perceptual metric of artificial reverberation quality. As echo density is related to the statistical measure of kurtosis, this article demonstrates that the statistics of an RIR can be modeled using a probabilistic mixture model. A mixture model designed specifically for modeling RIRs is proposed. The proposed method is useful for statistically replicating RIRs of a measured environment, thereby synthesizing new independent observations of an acoustic space. A perceptual pilot study is carried out to evaluate the fidelity of the replication process in monophonic and stereo artificial reverberators.
Download Real-Time Modal Synthesis of Nonlinearly Interconnected Networks Modal methods are a long-established approach to physical modeling sound synthesis. Projecting the equation of motion of a linear, time-invariant system onto a basis of eigenfunctions yields a set of independent forced, lossy oscillators, which may be simulated efficiently and accurately by means of standard time-stepping methods. Extensions of modal techniques to nonlinear problems are possible, though often requiring the solution of densely coupled nonlinear time-dependent equations. Here, an application of recent results in numerical simulation design is employed, in which the nonlinear energy is first quadratised via a convenient auxiliary variable. The resulting equations may be updated in time explicitly, thus avoiding the need for expensive iterative solvers, dense linear system solutions, or matrix inversions. The case of a network of interconnected distributed elements is detailed, along with a real-time implementation as an audio plugin.
Download Differentiable Feedback Delay Network for Colorless Reverberation Artificial reverberation algorithms often suffer from spectral coloration, usually in the form of metallic ringing, which impairs the perceived quality of sound. This paper proposes a method to reduce the coloration in the feedback delay network (FDN), a popular artificial reverberation algorithm. An optimization framework is employed entailing a differentiable FDN to learn a set of parameters decreasing coloration. The optimization objective is to minimize the spectral loss to obtain a flat magnitude response, with an additional temporal loss term to control the sparseness of the impulse response. The objective evaluation of the method shows a favorable narrower distribution of modal excitation while retaining the impulse response density. The subjective evaluation demonstrates that the proposed method lowers perceptual coloration of late reverberation, and also shows that the suggested optimization improves sound quality for small FDN sizes. The method proposed in this work constitutes an improvement in the design of accurate and high-quality artificial reverberation, simultaneously offering computational savings.