Download A Generalized Polynomial and Sinusoidal Model for Partial Tracking and Time Stretching
In this article, we introduce a new generalized model based on polynomials and sinusoids for partial tracking and time stretching. Nowadays, most partial tracking algorithms are based on the McAulay-Quatieri approach and use polynomials for phase, frequency, and amplitude tracks. Some sinusoidal approaches have also been proved to work in certain conditions. We will present here an unified model using both approaches, which will allow more flexible partial tracking and time stretching.
Download A High-Rate Data Hiding Technique for Audio Signals based on INTMDCT Quantization
Data hiding consists in hiding/embedding binary information within a signal in an imperceptible way. In this study we propose a high-rate data hiding technique suitable for uncompressed audio signals (PCM as used in Audio-CD and .wav format). This technique is appropriate for non-securitary applications, such as enriched-content applications, that require a large bitrate but no particular robustness to attacks. The proposed system is based on a quantization technique, the Quantization Index Modulation (QIM) applied on the Integer Modified Discrete Cosine Transform (IntMDCT) coefficients of the signal and guided by a PsychoAcoustic Model (PAM). This technique enables embedding bitrates up to 300 kbps (per channel), outperforming a previous version based on regular MDCT.
Download Phase-based informed source separation for active listening of music
This paper presents an informed source separation technique of monophonic mixtures. Although the vast majority of the separation methods are based on the time-frequency energy of each source, we introduce a new approach using solely phase information to perform the separation. The sources are iteratively reconstructed using an adaptation of the Multiple Input Spectrogram Inversion (MISI) algorithm from Gunawan and Sen. The proposed method is then tested against conventional MISI and Wiener filtering on monophonic signals and oracle conditions. Results show that at the cost of a larger computation time, our method outperforms both MISI and Wiener filtering in oracle conditions with much higher objective quality even with phase quantization.
Download Notes on the use of Variational Autoencoders for Speech and Audio Spectrogram Modeling
Variational autoencoders (VAEs) are powerful (deep) generative artificial neural networks. They have been recently used in several papers for speech and audio processing, in particular for the modeling of speech/audio spectrograms. In these papers, very poor theoretical support is given to justify the chosen data representation and decoder likelihood function or the corresponding cost function used for training the VAE. Yet, a nice theoretical statistical framework exists and has been extensively presented and discussed in papers dealing with nonnegative matrix factorization (NMF) of audio spectrograms and its application to audio source separation. In the present paper, we show how this statistical framework applies to VAE-based speech/audio spectrogram modeling. This provides the latter insights on the choice and interpretability of data representation and model parameterization.