Download Antiderivative Antialiasing in Nonlinear Wave Digital Filters A major problem in the emulation of discrete-time nonlinear systems, such as those encountered in Virtual Analog modeling, is
aliasing distortion. A trivial approach to reduce aliasing is oversampling. However, this solution may be too computationally demanding for real-time applications. More advanced techniques
to suppress aliased components are arbitrary-order Antiderivative
Antialiasing (ADAA) methods that approximate the reference nonlinear function using a combination of its antiderivatives of different orders. While in its original formulation it is applied only
to memoryless systems, recently, the applicability of first-order
ADAA has been extended to stateful systems employing their statespace description. This paper presents an alternative formulation
that successfully applies arbitrary-order ADAA methods to Wave
Digital Filter models of dynamic circuits with one nonlinear element. It is shown that the proposed approach allows us to design
ADAA models of the nonlinear elements in a fully local and modular fashion, independently of the considered reference circuit. Further peculiar features of the proposed approach, along with two
examples of applications, are discussed.
Download A Structural Similarity Index Based Method to Detect Symbolic Monophonic Patterns in Real-Time Automatic detection of musical patterns is an important task in the field of Music Information Retrieval due to its usage in multiple applications such as automatic music transcription, genre or instrument identification, music classification, and music recommendation. A significant sub-task in pattern detection is the realtime pattern detection in music due to its relevance in application domains such as the Internet of Musical Things. In this study, we present a method to identify the occurrence of known patterns in symbolic monophonic music streams in real-time. We introduce a matrix-based representation to denote musical notes using its pitch, pitch-bend, amplitude, and duration. We propose an algorithm based on an independent similarity index for each note attribute. We also introduce the Match Measure, which is a numerical value signifying the degree of the match between a pattern and a sequence of notes. We have tested the proposed algorithm against three datasets: a human recorded dataset, a synthetically designed dataset, and the JKUPDD dataset. Overall, a detection rate of 95% was achieved. The low computational load and minimal running time demonstrate the suitability of the method for real-world, real-time implementations on embedded systems.
Download Design of FPGA-based High-order FDTD Method for Room Acoustics Sound field rendering with finite difference time domain (FDTD) method is computation-intensive and memory-intensive. This research investigates an FPGA-based acceleration system for sound field rendering with the high-order FDTD method, in which spatial and temporal blockings are applied to alleviate external memory bandwidth bottleneck and reuse data, respectively. After implemented by using the FPGA card DE10-Pro, the FPGA-based sound field rendering systems outperform the software simulations conducted on a desktop machine with 512 GB DRAMs and a Xeon Gold 6212U processor (24 cores) running at 2.4 GHz by 11 times, 13 times, and 18 times in computing performance in the case of the 2nd-order, 4th-order, and 6th-order FDTD schemes, respectively, even though the FPGA-based sound field rendering systems run at much lower clock frequency and have much smaller on-chip and external memory.
Download An active learning procedure for the interaural time difference discrimination threshold Measuring the auditory lateralization elicited by interaural time difference (ITD) cues involves the estimation of a psychometric function (PF). The shape of this function usually follows from the analysis of the subjective data and models the probability of correctly localizing the angular position of a sound source. The present study describes and evaluates a procedure for progressively fitting a PF, using Gaussian process classification of the subjective responses produced during a binary decision experiment. The process refines adaptively an approximated PF, following Bayesian inference. At each trial, it suggests the most informative auditory stimulus for function refinement according to Bayesian active learning by disagreement (BALD) mutual information. In this paper, the procedure was modified to accommodate two-alternative forced choice (2AFC) experimental methods and then was compared with a standard adaptive “three-down, one-up” staircase procedure. Our process approximates the average threshold ITD 79.4% correct level of lateralization with a mean accuracy increase of 8.9% over the Weibull function fitted on the data of the same test. The final accuracy for the Just Noticeable Difference (JND) in ITD is achieved with only 37.6% of the trials needed by a standard lateralization test.
Download A Wavelet-Based Method for the Estimation of Clarity of Attack Parameters in Non-Percussive Instruments From the exploration of databases of instrument sounds to the selfassisted practice of musical instruments, methods for automatically
and objectively assessing the quality of musical tones are in high
demand. In this paper, we develop a new algorithm for estimating
the duration of the attack, with particular attention to wind and
bowed string instruments. In fact, for these instruments, the quality
of the tones is highly influenced by the attack clarity, for which,
together with pitch stability, the attack duration is an indicator often
used by teachers by ear. Since the direct estimation of the attack
duration from sounds is made difficult by the initial preponderance of the excitation noise, we propose a more robust approach
based on the separation of the ensemble of the harmonics from the
excitation noise, which is obtained by means of an improved pitchsynchronous wavelet transform. We also define a new parameter,
the noise ducking time, which is relevant for detecting the extent of
the noise component in the attack. In addition to the exploration of
available sound databases, for testing our algorithm, we created an
annotated data set in which several problematic sounds are included.
Moreover, to check the consistency and robustness of our duration
estimates, we applied our algorithm to sets of synthetic sounds with
noisy attacks of programmable duration.
Download DiffVox: A Differentiable Model for Capturing and Analysing Vocal Effects Distributions This study introduces a novel and interpretable model, DiffVox,
for matching vocal effects in music production. DiffVox, short
for “Differentiable Vocal Fx", integrates parametric equalisation,
dynamic range control, delay, and reverb with efficient differentiable implementations to enable gradient-based optimisation for
parameter estimation. Vocal presets are retrieved from two datasets,
comprising 70 tracks from MedleyDB and 365 tracks from a private collection. Analysis of parameter correlations reveals strong
relationships between effects and parameters, such as the highpass and low-shelf filters often working together to shape the low
end, and the delay time correlating with the intensity of the delayed signals. Principal component analysis reveals connections to
McAdams’ timbre dimensions, where the most crucial component
modulates the perceived spaciousness while the secondary components influence spectral brightness. Statistical testing confirms
the non-Gaussian nature of the parameter distribution, highlighting
the complexity of the vocal effects space. These initial findings on
the parameter distributions set the foundation for future research
in vocal effects modelling and automatic mixing.
Download Audio De-Thumping using Huang s Empirical Mode Decomposition In the context of audio restoration, sound transfer of broken disks usually produces audio signals corrupted with long pulses of low-frequency content, also called thumps. This paper presents a method for audio de-thumping based on Huang’s Empirical Mode Decomposition (EMD), provided the pulse locations are known beforehand. Thus, the EMD is used as a means to obtain pulse estimates to be subtracted from the degraded signals. Despite its simplicity, the method is demonstrated to tackle well the challenging problem of superimposed pulses. Performance assessment against selected competing solutions reveals that the proposed solution tends to produce superior de-thumping results.
Download Vivos Voco: A survey of recent research on voice transformations at IRCAM IRCAM has a long experience in analysis, synthesis and transformation of voice. Natural voice transformations are of great interest for many applications and can be combine with text-to-speech system, leading to a powerful creation tool. We present research conducted at IRCAM on voice transformations for the last few years. Transformations can be achieved in a global way by modifying pitch, spectral envelope, durations etc. While it sacrifices the possibility to attain a specific target voice, the approach allows the production of new voices of a high degree of naturalness with different gender and age, modified vocal quality, or another speech style. These transformations can be applied in realtime using ircamTools TR A X.Transformation can also be done in a more specific way in order to transform a voice towards the voice of a target speaker. Finally, we present some recent research on the transformation of expressivity.