Download Dynamic Pitch Warping for Expressive Vocal Retuning This work introduces the use of the Dynamic Pitch Warping (DPW) method for automatic pitch correction of singing voice audio signals. DPW is designed to dynamically tune any pitch trajectory to a predefined scale while preserving its expressive ornamentation. DPW has three degrees of freedom to modify the fundamental frequency (f0 ) signal: detection interval, critical time, and transition time. Together, these parameters allow us to define a pitch velocity condition that triggers an adaptive correction of the pitch trajectory (pitch warping). We compared our approach to Antares Autotune (the most commonly used software brand, abbreviated as ATA in this article). The pitch correction in ATA has two degrees of freedom: a triggering threshold (flextune) and the transition time (retune speed). The pitch trajectories that we compare were extracted from autotuned-in-ATA audio signals, and the DPW algorithm implemented over the f0 of the input audio tracks. We studied specifically pitch correction for three typical situations of f0 curves: staircase, vibrato, free-path. We measured the proximity of the corrected pitch trajectories to the original ones for each case obtaining that the DPW pitch correction method is better to preserve vibrato while keeping the f0 free path. In contrast, ATA is more effective in generating staircase curves, but fails for notsmall vibratos and free-path curves. We have also implemented an off-line automatic picth tuner using DPW.
Download Digitizing the Schumann PLL Analog Harmonizer The Schumann Electronics PLL is a guitar effect that uses hardwarebased processing of one-bit digital signals, with op-amp saturation and CMOS control systems used to generate multiple square waves derived from the frequency of the input signal. The effect may be simulated in the digital domain by cascading stages of statespace virtual analog modeling and algorithmic approximations of CMOS integrated circuits. Phase-locked loops, decade counters, and Schmitt trigger inverters are modeled using logic algorithms, allowing for the comparable digital implementation of the Schumann PLL. Simulation results are presented.
Download Modeling the Frequency-Dependent Sound Energy Decay of Acoustic Environments with Differentiable Feedback Delay Networks Differentiable machine learning techniques have recently proved effective for finding the parameters of Feedback Delay Networks (FDNs) so that their output matches desired perceptual qualities of target room impulse responses. However, we show that existing methods tend to fail at modeling the frequency-dependent behavior of sound energy decay that characterizes real-world environments unless properly trained. In this paper, we introduce a novel perceptual loss function based on the mel-scale energy decay relief, which generalizes the well-known time-domain energy decay curve to multiple frequency bands. We also augment the prototype FDN by incorporating differentiable wideband attenuation and output filters, and train them via backpropagation along with the other model parameters. The proposed approach improves upon existing strategies for designing and training differentiable FDNs, making it more suitable for audio processing applications where realistic and controllable artificial reverberation is desirable, such as gaming, music production, and virtual reality.
Download Fast Differentiable Modal Simulation of Non-Linear Strings, Membranes, and Plates Modal methods for simulating vibrations of strings, membranes, and plates are widely used in acoustics and physically
informed audio synthesis. However, traditional implementations,
particularly for non-linear models like the von Kármán plate, are
computationally demanding and lack differentiability, limiting inverse modelling and real-time applications. We introduce a fast,
differentiable, GPU-accelerated modal framework built with the
JAX library, providing efficient simulations and enabling gradientbased inverse modelling.
Benchmarks show that our approach
significantly outperforms CPU and GPU-based implementations,
particularly for simulations with many modes. Inverse modelling
experiments demonstrate that our approach can recover physical
parameters, including tension, stiffness, and geometry, from both
synthetic and experimental data. Although fitting physical parameters is more sensitive to initialisation compared to methods that
fit abstract spectral parameters, it provides greater interpretability
and more compact parameterisation. The code is released as open
source to support future research and applications in differentiable
physical modelling and sound synthesis.
Download Digital Morphophone Environment. Computer Rendering of a Pioneering Sound Processing Device This paper introduces a digital reconstruction of the morphophone,
a complex magnetophonic device developed in the 1950s within
the laboratories of the GRM (Groupe de Recherches Musicales)
in Paris. The analysis, design, and implementation methodologies
underlying the Digital Morphophone Environment are discussed.
Based on a detailed review of historical sources and limited
documentation – including a small body of literature and, most
notably, archival images – the core operational principles of the
morphophone have been modeled within the MAX visual programming environment. The main goals of this work are, on the one
hand, to study and make accessible a now obsolete and unavailable
tool, and on the other, to provide the opportunity for new explorations in computer music and research.
Download Perceptual Decorrelator Based on Resonators Decorrelation filters transform mono audio into multiple decorrelated copies. This paper introduces a novel decorrelation filter design based on a resonator bank, which produces a sum of over a thousand exponentially decaying sinusoids. A headphone listening test was used to identify the minimum inter-channel time delays that perceptually match ERB-filtered coherent noise to corresponding incoherent noise. The decay rate of each resonator is set based on a group delay profile determined by the listening test results at its corresponding frequency. Furthermore, the delays from the test are used to refine frequency-dependent windowing in coherence estimation, which we argue represents the perceptually most accurate way of assessing interaural coherence. This coherence measure then guides an optimization process that adjusts the initial phases of the sinusoids to minimize the coherence between two instances of the resonator-based decorrelator. The delay results establish the necessary group delay per ERB for effective decorrelation, revealing higher-than-expected values, particularly at higher frequencies. For comparison, the optimization is also performed using two previously proposed group-delay profiles: one based on the period of the ERB band center frequency and another based on the maximum group-delay limit before introducing smearing. The results indicate that the perceptually informed profile achieves equal decorrelation to the latter profile while smearing less at high frequencies. Overall, optimizing the phase response of the proposed decorrelator yields significantly lower coherence compared to using a random phase.
Download Stable Limit Cycles as Tunable Signal Sources This paper presents a method for synthesizing audio signals from
nonlinear dynamical systems exhibiting stable limit cycles, with
control over frequency and amplitude independent of changes to
the system’s internal parameters. Using the van der Pol oscillator
and the Brusselator as case studies, it is demonstrated how parameters are decoupled from frequency and amplitude by rescaling the
angular frequency and normalizing amplitude extrema. Practical
implementation considerations are discussed, as are the limits and
challenges of this approach. The method’s validity is evaluated experimentally and synthesis examples show the application of tunable nonlinear oscillators in sound design, including the generation
of transients in FM synthesis by means of a van der Pol oscillator
and a Supersaw oscillator bank based on the Brusselator.
Download Towards an Objective Comparison of Panning Feature Algorithms for Unsupervised Learning Estimations of panning attributes are an important feature to extract from a piece of recorded music, with downstream uses such
as classification, quality assessment, and listening enhancement.
While several algorithms exist in the literature, there is currently
no comparison between them and no studies to suggest which one
is most suitable for any particular task. This paper compares four
algorithms for extracting amplitude panning features with respect
to their suitability for unsupervised learning. It finds synchronicities between them and analyses their results on a small set of
commercial music excerpts chosen for their distinct panning features. The ability of each algorithm to differentiate between the
tracks is analysed. The results can be used in future work to either
select the most appropriate panning feature algorithm or create a
version customized for a particular task.
Download DDSP-Based Neural Waveform Synthesis of Polyphonic Guitar Performance From String-Wise MIDI Input We explore the use of neural synthesis for acoustic guitar from string-wise MIDI input. We propose four different systems and compare them with both objective metrics and subjective evaluation against natural audio and a sample-based baseline. We iteratively develop these four systems by making various considerations on the architecture and intermediate tasks, such as predicting pitch and loudness control features. We find that formulating the control feature prediction task as a classification task rather than a regression task yields better results. Furthermore, we find that our simplest proposed system, which directly predicts synthesis parameters from MIDI input performs the best out of the four proposed systems. Audio examples and code are available.
Download Parameter Estimation of Frequency-Modulated Sinusoids with the Distribution Derivative Method Frequency-modulated (FM) sinusoids are commonly used to model signals in several engineering applications, such as radar, sonar, communications, acoustics, and optics. The estimation of the parameters of FM sinusoids is a challenging problem with a long history in the literature. In this article, we use the distribution derivative method (DDM) to estimate the parameters of FM sinusoids in additive white Gaussian noise. Firstly, we derive the estimation of parameters of the model with DDM. Then, we compare the results of Monte-Carlo simulations (MCS) of DDM estimation of FM signals in additive white Gaussian noise against the state of the art (SOTA) and the Cramér-Rao lower bound (CRLB). DDM estimation of FM sinusoids showed performance comparable to the SOTA with less estimation bias. Additionally, DDM estimation of FM sinusoids is simple and straightforward to implement with the fast Fourier transform (FFT) relative to other approaches in the literature. Finally, DDM estimation has effectively the same computational complexity as the FFT.