Download Equalizing Loudspeakers in Reverberant Environments Using Deep Convolutive Dereverberation Loudspeaker equalization is an established topic in the literature, and currently many techniques are available to address most practical use cases. However, most of these rely on accurate measurements of the loudspeaker in an anechoic environment, which in some occurrences is not feasible. This is the case, e.g. of custom digital organs, which have a set of loudspeakers that are built into a large and geometrically-complex piece of furniture, which may be too heavy and large to be transported to a measurement room, or may require a big one, making traditional impulse response measurements impractical for most users. In this work we propose a method to find the inverse of the sound emission system in a reverberant environment, based on a Deep Learning dereverberation algorithm. The method is agnostic of the room characteristics and can be, thus, conducted in an automated fashion in any environment. A real use case is discussed and results are provided, showing the effectiveness of the approach in designing filters that match closely the magnitude response of the ideal inverting filters.
Download Guitar Tone Stack Modeling with a Neural State-Space Filter In this work, we present a data-driven approach to modeling tone stack circuits in guitar amplifiers and distortion pedals. To this aim, the proposed modeling approach uses a feedforward fully connected neural network to predict the parameters of a coupledform state-space filter, ensuring the numerical stability of the resulting time-varying system. The neural network is conditioned on the tone controls of the target tone stack and is optimized jointly with the coupled-form state-space filter to match the target frequency response. To assess the proposed approach, we model three popular tone stack schematics with both matched-order and overparameterized filters and conduct an objective comparison with well-established approaches that use cascaded biquad filters. Results from the conducted experiments demonstrate improved accuracy of the proposed modeling approach, especially in the case of over-parameterized state-space filters while guaranteeing numerical stability. Our method can be deployed, after training, in realtime audio processors.
Download DDSP-SFX: Acoustically-Guided Sound Effects Generation with Differentiable Digital Signal Processing Controlling the variations of sound effects using neural audio synthesis models has been a challenging task. Differentiable digital signal processing (DDSP) provides a lightweight solution that achieves high-quality sound synthesis while enabling deterministic acoustic attribute control by incorporating pre-processed audio features and digital synthesizers. In this research, we introduce DDSP-SFX, a model based on the DDSP architecture capable of synthesizing high-quality sound effects while enabling users to control the timbre variations easily. We integrate a transient modelling algorithm in DDSP that achieves higher objective evaluation scores and subjective ratings over impulsive signals (footsteps, gunshots). We propose a novel method that achieves frame-level timbre variation control while also allowing deterministic attribute control. We further qualitatively show the timbre transfer performance using voice as the guiding sound.
Download Modified Late Reverberation in an Audio Augmented Reality Scenario This paper presents a headphone-based audio augmented reality demonstrator showcasing the effects of manipulated late reverberation in rendering virtual sound sources. The setup is based on a dataset of binaural room impulse responses measured along a 2 m long line, which is used to imitate the reproduction of a pair of loudspeakers. Therefore, listeners can explore the virtual sources by moving back and forth and rotating arbitrarily on this line. The demo allows the user to adjust the late reverberation tail of the auralizations interactively from shorter to longer decay times regarding the baseline decay behavior. Modification of the decay times is based on resynthesizing the late reverberation using frequencydependent shaping of binaural white noise and modal reconstruction. The paper includes descriptions of the frameworks used for this demo and an overview of the required data and processing steps.
Download Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data Recent years have seen increasing interest in applying deep learning methods to the modeling of guitar amplifiers or effect pedals. Existing methods are mainly based on the supervised approach, requiring temporally-aligned data pairs of unprocessed and rendered audio. However, this approach does not scale well, due to the complicated process involved in creating the data pairs. A very recent work done by Wright et al. has explored the potential of leveraging unpaired data for training, using a generative adversarial network (GAN)-based framework. This paper extends their work by using more advanced discriminators in the GAN, and using more unpaired data for training. Specifically, drawing inspiration from recent advancements in neural vocoders, we employ in our GANbased model for guitar amplifier modeling two sets of discriminators, one based on multi-scale discriminator (MSD) and the other multi-period discriminator (MPD). Moreover, we experiment with adding unprocessed audio signals that do not have the corresponding rendered audio of a target tone to the training data, to see how much the GAN model benefits from the unpaired data. Our experiments show that the proposed two extensions contribute to the modeling of both low-gain and high-gain guitar amplifiers.
Download LTFATPY: Towards Making a Wide Range of Time-Frequency Representations Available in Python LTFATPY is a software package for accessing the Large Time Frequency Analysis Toolbox (LTFAT) from Python. Dedicated to time-frequency analysis, LTFAT comprises a large number of linear transforms for Fourier, Gabor, and wavelet analysis along with their associated operators. Its filter bank module is a collection of computational routines for finite impulse response and band-limited filters, allowing for the specification of constant-Q and auditory-inspired transforms. While LTFAT has originally been written in MATLAB/GNU Octave, the recent popularity of the Python programming language in related fields, such as signal processing and machine learning, makes it desirable to have LTFAT available in Python as well. We introduce LTFATPY, describe its main features, and outline further developments.
Download Evaluating Neural Networks Architectures for Spring Reverb Modelling Reverberation is a key element in spatial audio perception, historically achieved with the use of analogue devices, such as plate and spring reverb, and in the last decades with digital signal processing techniques that have allowed different approaches for Virtual Analogue Modelling (VAM). The electromechanical functioning of the spring reverb makes it a nonlinear system that is difficult to fully emulate in the digital domain with white-box modelling techniques. In this study, we compare five different neural network architectures, including convolutional and recurrent models, to assess their effectiveness in replicating the characteristics of this audio effect. The evaluation is conducted on two datasets at sampling rates of 16 kHz and 48 kHz. This paper specifically focuses on neural audio architectures that offer parametric control, aiming to advance the boundaries of current black-box modelling techniques in the domain of spring reverberation.
Download Leveraging Electric Guitar Tones and Effects to Improve Robustness in Guitar Tablature Transcription Modeling Guitar tablature transcription (GTT) aims at automatically generating symbolic representations from real solo guitar performances. Due to its applications in education and musicology, GTT has gained traction in recent years. However, GTT robustness has been limited due to the small size of available datasets. Researchers have recently used synthetic data that simulates guitar performances using pre-recorded or computer-generated tones, allowing for scalable and automatic data generation. The present study complements these efforts by demonstrating that GTT robustness can be improved by including synthetic training data created using recordings of real guitar tones played with different audio effects. We evaluate our approach on a new evaluation dataset with professional solo guitar performances that we composed and collected, featuring a wide array of tones, chords, and scales.
Download A Real-Time Approach for Estimating Pulse Tracking Parameters for Beat-Synchronous Audio Effects Predominant Local Pulse (PLP) estimation, an established method for extracting beat positions and other periodic pulse information from audio signals, has recently been extended with an online variant tailored for real-time applications. In this paper, we introduce a novel approach to generating various real-time control signals from the original online PLP output. While the PLP activation function encodes both predominant pulse information and pulse stability, we propose several normalization procedures to discern local pulse oscillation from stability, utilizing the PLP activation envelope. Through this, we generate pulse-synchronous Low Frequency Oscillators (LFOs) and supplementary confidence-based control signals, enabling dynamic control over audio effect parameters in real-time. Additionally, our approach enables beat position prediction, providing a look-ahead capability, for example, to compensate for system latency. To showcase the effectiveness of our control signals, we introduce an audio plugin prototype designed for integration within a Digital Audio Workstation (DAW), facilitating real-time applications of beat-synchronous effects during live mixing and performances. Moreover, this plugin serves as an educational tool, providing insights into PLP principles and the tempo structure of analyzed music signals.
Download Real-Time System for Sound Enhancement in Noisy Environment The noise can affect the listening experience in many real-life situations involving loudspeakers as a playback device. A solution to reduce the effect of the noise is to employ headphones, but they can be annoying and not allowed on some occasions. In this context, a system for improving the audio perception and the intelligibility of sounds in a domestic noisy environment is introduced and a real-time implementation is proposed. The system comprises three main blocks: a noise estimation procedure based on an adaptive algorithm, an auditory spectral masking algorithm that estimates the music threshold capable of masking the noise source, and an FFT equalizer that is used to apply the estimated level. It has been developed on an embedded DSP board considering one microphone for the ambient noise analysis and two vibrating sound transducers for sound reproduction. Several experiments on simulated and real-world scenarios have been realized to prove the effectiveness of the proposed approach.