Download Modeling and Extending the Rca Mark Ii Sound Effects Filter We have analyzed the Sound Effects Filter from the one-of-a-kind RCA Mark II sound synthesizer and modeled it as a Wave Digital Filter using the Faust language, to make this once exclusive device widely available. By studying the original schematics and measurements of the device, we discovered several circuit modifications. Building on these, we proposed a number of extensions to the circuit which increase its usefulness in music production.
Download Feature-Informed Latent Space Regularization for Music Source Separation The integration of additional side information to improve music source separation has been investigated numerous times, e.g., by adding features to the input or by adding learning targets in a multi-task learning scenario. These approaches, however, require additional annotations such as musical scores, instrument labels, etc. in training and possibly during inference. The available datasets for source separation do not usually provide these additional annotations. In this work, we explore transfer learning strategies to incorporate VGGish features with a state-of-the-art source separation model; VGGish features are known to be a very condensed representation of audio content and have been successfully used in many music information retrieval tasks. We introduce three approaches to incorporate the features, including two latent space regularization methods and one naive concatenation method. Our preliminary results show that our proposed approaches could improve some evaluation metrics for music source separation. In this work, we also include a discussion of our proposed approaches, such as the pros and cons of each approach, and the potential extension/improvement.
Download Efficient simulation of the yaybahar using a modal approach This work presents a physical model of the yaybahar, a recently invented acoustic instrument. Here, output from a bowed string is passed through a long spring, before being amplified and propagated in air via a membrane. The highly dispersive character of the spring is responsible for the typical synthetic tonal quality of this instrument. Building on previous literature, this work presents a modal discretisation of the full system, with fine control over frequency-dependent decay times, modal amplitudes and frequencies, all essential for an accurate simulation of the dispersive characteristics of reverberation. The string-bow-bridge system is also solved in the modal domain, using recently developed noniterative numerical methods allowing for efficient simulation.
Download Sample Rate Independent Recurrent Neural Networks for Audio Effects Processing In recent years, machine learning approaches to modelling guitar amplifiers and effects pedals have been widely investigated and have become standard practice in some consumer products. In particular, recurrent neural networks (RNNs) are a popular choice for modelling non-linear devices such as vacuum tube amplifiers and distortion circuitry. One limitation of such models is that they are trained on audio at a specific sample rate and therefore give unreliable results when operating at another rate. Here, we investigate several methods of modifying RNN structures to make them approximately sample rate independent, with a focus on oversampling. In the case of integer oversampling, we demonstrate that a previously proposed delay-based approach provides high fidelity sample rate conversion whilst additionally reducing aliasing. For non-integer sample rate adjustment, we propose two novel methods and show that one of these, based on cubic Lagrange interpolation of a delay-line, provides a significant improvement over existing methods. To our knowledge, this work provides the first in-depth study into this problem.
Download Network Bending of Diffusion Models for Audio-Visual Generation In this paper we present the first steps towards the creation of a tool which enables artists to create music visualizations using pretrained, generative, machine learning models. First, we investigate the application of network bending, the process of applying transforms within the layers of a generative network, to image generation diffusion models by utilizing a range of point-wise, tensorwise, and morphological operators. We identify a number of visual effects that result from various operators, including some that are not easily recreated with standard image editing tools. We find that this process allows for continuous, fine-grain control of image generation which can be helpful for creative applications. Next, we generate music-reactive videos using Stable Diffusion by passing audio features as parameters to network bending operators. Finally, we comment on certain transforms which radically shift the image and the possibilities of learning more about the latent space of Stable Diffusion based on these transforms.
Download Wave Digital Model of the MXR Phase 90 Based on a Time-Varying Resistor Approximation of JFET Elements Virtual Analog (VA) modeling is the practice of digitally emulating analog audio gear. Over the past few years, with the purpose of recreating the alleged distinctive sound of audio equipment and musicians, many different guitar pedals have been emulated by means of the VA paradigm but little attention has been given to phasers. Phasers process the spectrum of the input signal with time-varying notches by means of shifting stages typically realized with a network of transistors, whose nonlinear equations are, in general, demanding to be solved. In this paper, we take as a reference the famous MXR Phase 90 guitar pedal, and we propose an efficient time-varying model of its Junction Field-Effect Transistors (JFETs) based on a channel resistance approximation. We then employ such a model in the Wave Digital domain to emulate in real-time the guitar pedal, obtaining an implementation characterized by low computational cost and good accuracy.
Download Differentiable MIMO Feedback Delay Networks for Multichannel Room Impulse Response Modeling Recently, with the advent of new performing headsets and goggles, the demand for Virtual and Augmented Reality applications has experienced a steep increase. In order to coherently navigate the virtual rooms, the acoustics of the scene must be emulated in the most accurate and efficient way possible. Amongst others, Feedback Delay Networks (FDNs) have proved to be valuable tools for tackling such a task. In this article, we expand and adapt a method recently proposed for the data-driven optimization of single-inputsingle-output FDNs to the multiple-input-multiple-output (MIMO) case for addressing spatial/space-time processing applications. By testing our methodology on items taken from two different datasets, we show that the parameters of MIMO FDNs can be jointly optimized to match some perceptual characteristics of given multichannel room impulse responses, overcoming approaches available in the literature, and paving the way toward increasingly efficient and accurate real-time virtual room acoustics rendering.
Download A Highly Parametrized Scattering Delay Network Implementation for Interactive Room Auralization Scattering Delay Networks (SDNs) are an interesting approach to artificial reverberation, with parameters tied to the room’s physical properties and the computational efficiency of delay networks. This paper presents a highly-parametrized and real-time plugin of an SDN. The SDN plugin allows for interactive room auralization, enabling users to modify the parameters affecting the reverberation in real-time. These parameters include source and receiver positions, room shape and size, and wall absorption properties. This makes our plugin suitable for applications that require realtime and interactive spatial audio rendering, such as virtual or augmented reality frameworks and video games. Additionally, the main contributions of this work include a filter design method for wall sound absorption, as well as plugin features such as air absorption modeling, various output formats (mono, stereo, binaural, and first to fifth order Ambisonics), open sound control (OSC) for controlling source and receiver parameters, and a graphical user interface (GUI). Evaluation tests showed that the reverberation time and the filter design approach are consistent with both theoretical references and real-world measurements. Finally, performance analysis indicated that the SDN plugin requires minimal computational resources.
Download Equalizing Loudspeakers in Reverberant Environments Using Deep Convolutive Dereverberation Loudspeaker equalization is an established topic in the literature, and currently many techniques are available to address most practical use cases. However, most of these rely on accurate measurements of the loudspeaker in an anechoic environment, which in some occurrences is not feasible. This is the case, e.g. of custom digital organs, which have a set of loudspeakers that are built into a large and geometrically-complex piece of furniture, which may be too heavy and large to be transported to a measurement room, or may require a big one, making traditional impulse response measurements impractical for most users. In this work we propose a method to find the inverse of the sound emission system in a reverberant environment, based on a Deep Learning dereverberation algorithm. The method is agnostic of the room characteristics and can be, thus, conducted in an automated fashion in any environment. A real use case is discussed and results are provided, showing the effectiveness of the approach in designing filters that match closely the magnitude response of the ideal inverting filters.
Download Guitar Tone Stack Modeling with a Neural State-Space Filter In this work, we present a data-driven approach to modeling tone stack circuits in guitar amplifiers and distortion pedals. To this aim, the proposed modeling approach uses a feedforward fully connected neural network to predict the parameters of a coupledform state-space filter, ensuring the numerical stability of the resulting time-varying system. The neural network is conditioned on the tone controls of the target tone stack and is optimized jointly with the coupled-form state-space filter to match the target frequency response. To assess the proposed approach, we model three popular tone stack schematics with both matched-order and overparameterized filters and conduct an objective comparison with well-established approaches that use cascaded biquad filters. Results from the conducted experiments demonstrate improved accuracy of the proposed modeling approach, especially in the case of over-parameterized state-space filters while guaranteeing numerical stability. Our method can be deployed, after training, in realtime audio processors.