Download Symbolic and audio processing to change the expressive intention of a recorded music performance A framework for real-time expressive modification of audio musical performances is presented. An expressiveness model compute the deviations of the musical parameters which are relevant in terms of control of the expressive intention. The modifications are then realized by the integration of the model with a sound processing engine.
Download Bio-Inspired Optimization of Parametric Onset Detectors Onset detectors are used to recognize the beginning of musical
events in audio signals. Manual parameter tuning for onset detectors is a time consuming task, while existing automated approaches often maximize only a single performance metric. These
automated approaches cannot be used to optimize detector algorithms for complex scenarios, such as real-time onset detection
where an optimization process must consider both detection accuracy and latency. For this reason, a flexible optimization algorithm
should account for more than one performance metric in a multiobjective manner. This paper presents a generalized procedure for
automated optimization of parametric onset detectors. Our procedure employs a bio-inspired evolutionary computation algorithm
to replace manual parameter tuning, followed by the computation
of the Pareto frontier for multi-objective optimization. The proposed approach was evaluated on all the onset detection methods
of the Aubio library, using a dataset of monophonic acoustic guitar
recordings. Results show that the proposed solution is effective in
reducing the human effort required in the optimization process: it
replaced more than two days of manual parameter tuning with 13
hours and 34 minutes of automated computation. Moreover, the
resulting performance was comparable to that obtained by manual
optimization.
Download Pyroadacoustics: A Road Acoustics Simulator Based on Variable Length Delay Lines In the development of algorithms for sound source detection, identification and localization, having the possibility to generate datasets in a flexible and fast way is of utmost importance. However, most of the available acoustic simulators used for this purpose target indoor applications, and their usefulness is limited when it comes to outdoor environments such as that of a road, involving fast moving sources and long distances travelled by the sound waves. In this paper we present an acoustic propagation simulator specifically designed for road scenarios. In particular, the proposed Python software package enables to simulate the observed sound resulting from a source moving on an arbitrary trajectory relative to the observer, exploiting variable length delay lines to implement sound propagation and Doppler effect. An acoustic model of the road reflection and air absorption properties has been designed and implemented using digital FIR filters. The architecture of the proposed software is flexible and open to extensions, allowing the package to kick-start the implementation of further outdoor acoustic simulation scenarios.
Download Hybrid Audio Inpainting Approach with Structured Sparse Decomposition and Sinusoidal Modeling This research presents a novel hybrid audio inpainting approach that considers the diversity of signals and enhances the reconstruction quality. Existing inpainting approaches have limitations, such as energy drop and poor reconstruction quality for non-stationary signals. Based on the fact that an audio signal can be considered as a mixture of three components: tonal, transients, and noise, the proposed approach divides the left and right reliable neighborhoods around the gap into these components using a structured sparse decomposition technique. The gap is reconstructed by extrapolating parameters estimated from the reliable neighborhoods of each component. Component-targeted methods are refined and employed to extrapolate the parameters based on their own acoustic characteristics. Experiments were conducted to evaluate the performance of the hybrid approach and compare it with other stateof-the-art inpainting approaches. The results show the hybrid approach achieves high-quality reconstruction and low computational complexity across various gap lengths and signal types, particularly for longer gaps and non-stationary signals.
Download NBU: Neural Binaural Upmixing of Stereo Content While immersive music productions have become popular in recent years, music content produced during the last decades has been predominantly mixed for stereo. This paper presents a datadriven approach to automatic binaural upmixing of stereo music. The network architecture HDemucs, previously utilized for both source separation and binauralization, is leveraged for an endto-end approach to binaural upmixing. We employ two distinct datasets, demonstrating that while custom-designed training data enhances the accuracy of spatial positioning, the use of professionally mixed music yields superior spatialization. The trained networks show a capacity to process multiple simultaneous sources individually and add valid binaural cues, effectively positioning sources with an average azimuthal error of less than 11.3 ◦ . A listening test with binaural experts shows it outperforms digital signal processing-based approaches to binauralization of stereo content in terms of spaciousness while preserving audio quality.
Download Sinusoid Extraction and Salience Function Design for Predominant Melody Estimation In this paper we evaluate some of the alternative methods commonly applied in the first stages of the signal processing chain of automatic melody extraction systems. Namely, the first two stages are studied – the extraction of sinusoidal components and the computation of a time-pitch salience function, with the goal of determining the benefits and caveats of each approach under the specific context of predominant melody estimation. The approaches are evaluated on a data-set of polyphonic music containing several musical genres with different singing/playing styles, using metrics specifically designed for measuring the usefulness of each step for melody extraction. The results suggest that equal loudness filtering and frequency/amplitude correction methods provide significant improvements, whilst using a multi-resolution spectral transform results in only a marginal improvement compared to the standard STFT. The effect of key parameters in the computation of the salience function is also studied and discussed.
Download Granular Resynthesis for Sound Unmixing In modern music genres like Pop, Rap, Hip-Hop or Techno many songs are built in a way that a pool of small musical pieces, so called loops, are used as building blocks. These loops are usually one, two or four bars long and build the accompaniment for the lead melody or singing voice. Very often the accompanying loops can be heard solo in a song at least once. This can be used as a-priori knowledge for removing these loops from the mixture. In this paper an algorithm based on granular resynthesis and spectral subtraction is presented which makes use of this a-priori knowledge. The algorithm uses two different synthesis strategies and is capable of removing known loops from mixtures even if the loop signal contained in the mixture signal is slightly different from the solo loop signal.
Download On the control of the phase of resonant filters with applications to percussive sound modeling Source-filter models are widely used in numerous audio processing fields, from speech processing to percussive/contact sound synthesis. The design of filters for these models—be it from scratch or from spectral analysis—usually involves tuning frequency and damping parameters and/or providing an all-pole model of the resonant part of the filter. In this context, and for the modelling of percussive (non-sustained) sounds, a source signal can be estimated from a filtered sound through a time-domain deconvolution process. The result can be plagued with artifacts when resonances exhibit very low bandwidth and lie very close in frequency. We propose in this paper a method that noticeably reduces the artifacts of the deconvolution process through an inter-resonance phase synchronization. Results show that the proposed method is able to design filters inducing fewer artifacts at the expense of a higher dynamic range.
Download The Simplest Analysis Method for Non-Stationary Sinusoidal Modeling This paper introduces an analysis method based on the generalization of the phase vocoder approach to non-stationary sinusoidal modeling. This new method is then compared to the reassignment method for the estimation of all the parameters of the model (phase, amplitude, frequency, amplitude modulation, and frequency modulation), and to the Cramér-Rao bounds. It turns out that this method compares to the state of the art in terms of performances, with the great advantage of being much simpler.
Download Intermodulation Effects Analysis using Complex Bandpass Filterbanks The objective of this paper is to show the ability of complex bandpass filterbanks to extract the intermodulation information that appears when two audio signals interact inside the same analysis band. To perform the analysis a sinusoidal model of the signals has been assumed. Three kinds of signals have been analyzed: a sum of two cosines, a sum of two linear chirps and a sum of two exponential chirps. The complex bandpass filtering of the signals is carried out using a new algorithm based on the Complex Continuous Wavelet Transform. The developed algorithm has been validated comparing the practical results with the theoretical instantaneous amplitude and instantaneous phase of the obtained model of the signals. With the appropriate width, the complex bandpass filters show the same behaviour as our perceptual ability to discriminate interacting tones when they fall inside a critical band of the human ear.