Download A Source Localization/Separation/Respatialization System Based on Unsupervised Classification of Interaural Cues
In this paper we propose a complete computational system for Auditory Scene Analysis. This time-frequency system localizes, separates, and spatializes an arbitrary number of audio sources given only binaural signals. The localization is based on recent research frameworks, where interaural level and time differences are combined to derive a confident direction of arrival (azimuth) at each frequency bin. Here, the power-weighted histogram constructed in the azimuth space is modeled as a Gaussian Mixture Model, whose parameter structure is revealed through a weighted Expectation Maximization. Afterwards, a bank of Gaussian spatial filters is configured automatically to extract the sources with significant energy accordingly to a posterior probability. In this frequency-domain framework, we also inverse a geometrical and physical head model to derive an algorithm that simulates a source as originating from any azimuth angle.
Download Synthetic Transaural Audio Rendering (STAR): a Perceptive Approach for Sound Spatialization
The principles of Synthetic Transaural Audio Rendering (STAR) were first introduced at DAFx-06. This is a perceptive approach for sound spatialization, whereas state-of-the-art methods are rather physical. With our STAR method, we focus neither on the wave field (such as HOA) nor on the sound wave (such as VBAP), but rather on the acoustic paths traveled by the sound to the listener ears. The STAR method consists in canceling the cross-talk signals between two loudspeakers and the ears of the listener (in a transaural way), with acoustic paths not measured but computed by some model (thus synthetic). Our model is based on perceptive cues, used by the human auditory system for sound localization. The aim is to give the listener the sensation of the position of each source, and not to reconstruct the corresponding acoustic wave or field. This should work with various loudspeaker configurations, with a large sweet spot, since the model is neither specialized for a specific configuration nor individualized for a specific listener. Experimental tests have been conducted in 2015 and 2019 with different rooms and audiences, for still, moving, and polyphonic musical sounds. It turns out that the proposed method is competitive with the state-of-the-art ones. However, this is a work in progress and further work is needed to improve the quality.
Download Musical Sound Effects in the SAS Model
Spectral models provide general representations of sound in which many audio effects can be performed in a very natural and musically expressive way. Based on additive synthesis, these models control many sinusoidal oscillators via a huge number of model parameters which are only remotely related to musical parameters as perceived by a listener. The Structured Additive Synthesis (SAS) sound model has the flexibility of additive synthesis while addressing this problem. It consists of a complete abstraction of sounds according to only four parameters: amplitude, frequency, color, and warping. Since there is a close correspondence between the SAS model parameters and perception, the control of the audio effects gets simplified. Many effects thus become accessible not only to engineers, but also to musicians and composers. But some effects are impossible to achieve in the SAS model. In fact structuring the sound representation imposes limitations not only on the sounds that can be represented, but also on the effects that can be performed on these sounds. We demonstrate these relations between models and effects for a variety of models from temporal to SAS, going through well-known spectral models.
Download Additive Synthesis Of Sound By Taking Advantage Of Psychoacoustics
In this paper we present an original technique designed in order to speed up additive synthesis. This technique consists in taking into account psychoacoustic phenomena (thresholds of hearing and masking) in order to ignore the inaudible partials during the synthesis process, thus saving a lot of computation time. Our algorithm relies on a specific data structure called “skip list” and has proven to be very efficient in practice. As a consequence, we are now able to synthesize an impressive number of spectral sounds in real time, without overloading the processor.
Download An Efficient Pitch-Tracking Algorithm Using A Combination Of Fourier Transforms
In this paper we present a technique for detecting the pitch of sound using a series of two forward Fourier transforms. We use an enhanced version of the Fourier transform for a better accuracy, as well as a tracking strategy among pitch candidates for an increased robustness. This efficient technique allows us to precisely find out the pitches of harmonic sounds such as the voice or classic musical instruments, but also of more complex sounds like rippled noises.
Download Informed Source Separation for Stereo Unmixing — An Open Source Implementation
Active listening consists in interacting with the music playing and has numerous potential applications from pedagogy to gaming, through creation. In the context of music industry, using existing musical recordings (e.g. studio stems), it could be possible for the listener to generate new versions of a given musical piece (i.e. artistic mix). But imagine one could do this from the original mix itself. In a previous research project, we proposed a coder / decoder scheme for what we called informed source separation: The coder determines the information necessary to recover the tracks and embeds it inaudibly (using watermarking) in the mix. The decoder enhances the source separation with this information. We proposed and patented several methods, using various types of embedded information and separation techniques, hoping that the music industry was ready to give the listener this freedom of active listening. Fortunately, there are numerous other applications possible, such as the manipulation of musical archives, for example in the context of ethnomusicology. But the patents remain for many years, which is problematic. In this article, we present an open-source implementation of a patent-free algorithm to address the mixing and unmixing audio problem for any type of music.
Download The DESAM Toolbox: Spectral Analysis of Musical Audio
In this paper is presented the DESAM Toolbox, a set of Matlab functions dedicated to the estimation of widely used spectral models for music signals. Although those models can be used in Music Information Retrieval (MIR) tasks, the core functions of the toolbox do not focus on any specific application. It is rather aimed at providing a range of state-of-the-art signal processing tools that decompose music files according to different signal models, giving rise to different “mid-level” representations. After motivating the need for such a toolbox, this paper offers an overview of the overall organization of the toolbox, and describes all available functionalities.
Download Breaking the Bounds: Introducing Informed Spectral Analysis
Sound applications based on sinusoidal modeling highly depend on the efficiency and the precision of the estimators of its analysis stage. In a previous work, theoretical bounds for the best achievable precision were shown and these bounds are reached by efficient estimators like the reassignment or the derivative methods. We show that it is possible to break these theoretical bounds with just a few additional bits of information of the original content, introducing the concept of “informed analysis”. This paper shows that existing estimators combined with some additional information can reach any expected level of precision, even in very low signal-to-noise ratio conditions, thus enabling high-quality sound effects, without the typical but unwanted musical noise.
Download First-Order Ambisonic Coding with PCA Matrixing and Quaternion-Based Interpolation
We present a spatial audio coding method which can extend existing speech/audio codecs, such as EVS or Opus, to represent first-order ambisonic (FOA) signals at low bit rates. The proposed method is based on principal component analysis (PCA) to decorrelate ambisonic components prior to multi-mono coding. The PCA rotation matrices are quantized in the generalized Euler angle domain; they are interpolated in quaternion domain to avoid discontinuities between successive signal blocks. We also describe an adaptive bit allocation algorithm for an optimized multi-mono coding of principal components. A subjective evaluation using the MUSHRA methodology is presented to compare the performance of the proposed method with naive multi-mono coding using a fixed bit allocation. Results show significant quality improvements at bit rates in the range of 52.8 kbit/s (4 × 13.2) to 97.6 kbit/s (4 × 24.4) using the EVS codec.