Download CONMOD: Controllable Neural Frame-Based Modulation Effects Deep learning models have seen widespread use in modelling LFOdriven audio effects, such as phaser and flanger. Although existing neural architectures exhibit high-quality emulation of individual effects, they do not possess the capability to manipulate the output via control parameters. To address this issue, we introduce Controllable Neural Frame-based Modulation Effects (CONMOD), a single black-box model which emulates various LFOdriven effects in a frame-wise manner, offering control over LFO frequency and feedback parameters. Additionally, the model is capable of learning the continuous embedding space of two distinct phaser effects, enabling us to steer between effects and achieve creative outputs. Our model outperforms previous work while possessing both controllability and universality, presenting opportunities to enhance creativity in modern LFO-driven audio effects. Additional demo of our model is available in the accompanying website.1
Download Digitizing the Schumann PLL Analog Harmonizer The Schumann Electronics PLL is a guitar effect that uses hardwarebased processing of one-bit digital signals, with op-amp saturation and CMOS control systems used to generate multiple square waves derived from the frequency of the input signal. The effect may be simulated in the digital domain by cascading stages of statespace virtual analog modeling and algorithmic approximations of CMOS integrated circuits. Phase-locked loops, decade counters, and Schmitt trigger inverters are modeled using logic algorithms, allowing for the comparable digital implementation of the Schumann PLL. Simulation results are presented.
Download DDSP-Based Neural Waveform Synthesis of Polyphonic Guitar Performance From String-Wise MIDI Input We explore the use of neural synthesis for acoustic guitar from string-wise MIDI input. We propose four different systems and compare them with both objective metrics and subjective evaluation against natural audio and a sample-based baseline. We iteratively develop these four systems by making various considerations on the architecture and intermediate tasks, such as predicting pitch and loudness control features. We find that formulating the control feature prediction task as a classification task rather than a regression task yields better results. Furthermore, we find that our simplest proposed system, which directly predicts synthesis parameters from MIDI input performs the best out of the four proposed systems. Audio examples and code are available.
Download Characterisation and Excursion Modelling of Audio Haptic Transducers Statement and calculation of objective audio haptic transducer performance metrics facilitates optimisation of multi-sensory sound reproduction systems. Measurements of existing haptic transducers are applied to the calculation of a series of performance metrics to demonstrate a means of comparative objective analysis. The frequency response, transient response and moving mass excursion characteristics of each measured transducer are quantified using novel and previously defined metrics. Objective data drawn from a series of practical measurements shows that the proposed metrics and means of excursion modelling applied herein are appropriate for haptic transducer evaluation and protection against over-excursion respectively.
Download Naturalness of Double-Slope Decay in Generalised Active Acoustic Enhancement Systems Active acoustic enhancement systems (AAESs) alter the perceived acoustics of a space by using microphones and loudspeakers to introduce sound energy into the room. Double-sloped energy decay may be observed in these systems. However, it is unclear as to which conditions lead to this effect, and to what extent double sloping reduces the perceived naturalness of the reverberation compared to Sabine decay. This paper uses simulated combinations of AAES parameters to identify which cases affect the objective curvature of the energy decay. A subjective test with trained listeners assessed the naturalness of these conditions. Using an AAES model, room impulse responses were generated for varying room dimensions, absorption coefficients, channel counts, system loop gains and reverberation times (RTs) of the artificial reverberator. The objective double sloping was strongly correlated to the ratio between the reverberator and passive room RTs, but parameters such as absorption and room size did not have a profound effect on curvature. It was found that double sloping significantly reduced the perceived naturalness of the reverberation, especially when the reverberator RT was greater than two times that of the passive room. Double sloping had more effect on the naturalness ratings when subjects listened to a more absorptive passive room, and also when using speech rather than transient stimuli. Lowering the loop gain by 9 dB increased the naturalness of the doublesloped stimuli, where some were rated as significantly more natural than the Sabine decay stimuli from the passive room.
Download Hyper Recurrent Neural Network: Condition Mechanisms for Black-Box Audio Effect Modeling Recurrent neural networks (RNNs) have demonstrated impressive results for virtual analog modeling of audio effects. These networks process time-domain audio signals using a series of matrix multiplication and nonlinear activation functions to emulate the behavior of the target device accurately. To additionally model the effect of the knobs for an RNN-based model, existing approaches integrate control parameters by concatenating them channel-wisely with some intermediate representation of the input signal. While this method is parameter-efficient, there is room to further improve the quality of generated audio because the concatenation-based conditioning method has limited capacity in modulating signals. In this paper, we propose three novel conditioning mechanisms for RNNs, tailored for black-box virtual analog modeling. These advanced conditioning mechanisms modulate the model based on control parameters, yielding superior results to existing RNN- and CNN-based architectures across various evaluation metrics.
Download Spectral Analysis of Stochastic Wavetable Synthesis Dynamic Stochastic Wavetable Synthesis (DSWS) is a sound synthesis and processing technique that uses probabilistic waveform synthesis techniques invented by Iannis Xenakis as a modulation/ distortion effect applied to a wavetable oscillator. The stochastic manipulation of the wavetable provides a means to creating signals with rich, dynamic spectra. In the present work, the DSWS technique is compared to other fundamental sound synthesis techniques such as frequency modulation synthesis. Additionally, several extensions of the DSWS technique are proposed.
Download Leveraging Electric Guitar Tones and Effects to Improve Robustness in Guitar Tablature Transcription Modeling Guitar tablature transcription (GTT) aims at automatically generating symbolic representations from real solo guitar performances. Due to its applications in education and musicology, GTT has gained traction in recent years. However, GTT robustness has been limited due to the small size of available datasets. Researchers have recently used synthetic data that simulates guitar performances using pre-recorded or computer-generated tones, allowing for scalable and automatic data generation. The present study complements these efforts by demonstrating that GTT robustness can be improved by including synthetic training data created using recordings of real guitar tones played with different audio effects. We evaluate our approach on a new evaluation dataset with professional solo guitar performances that we composed and collected, featuring a wide array of tones, chords, and scales.
Download Guitar Tone Stack Modeling with a Neural State-Space Filter In this work, we present a data-driven approach to modeling tone stack circuits in guitar amplifiers and distortion pedals. To this aim, the proposed modeling approach uses a feedforward fully connected neural network to predict the parameters of a coupledform state-space filter, ensuring the numerical stability of the resulting time-varying system. The neural network is conditioned on the tone controls of the target tone stack and is optimized jointly with the coupled-form state-space filter to match the target frequency response. To assess the proposed approach, we model three popular tone stack schematics with both matched-order and overparameterized filters and conduct an objective comparison with well-established approaches that use cascaded biquad filters. Results from the conducted experiments demonstrate improved accuracy of the proposed modeling approach, especially in the case of over-parameterized state-space filters while guaranteeing numerical stability. Our method can be deployed, after training, in realtime audio processors.
Download Sound Matching Using Synthesizer Ensembles Sound matching allows users to automatically approximate existing sounds using a synthesizer. Previous work has mostly focused on algorithms for automatically programming an existing synthesizer. This paper proposes a system for selecting between different synthesizer designs, each one with a corresponding automatic programmer. An implementation that allows designing ensembles based on a template is demonstrated. Several experiments are presented using a simple subtractive synthesis design. Using an ensemble of synthesizer-programmer pairs is shown to provide better matching than a single programmer trained for an equivalent integrated synthesizer. Scaling to hundreds of synthesizers is shown to improve match quality.