Download Separation of Unvoiced Fricatives in Singing Voice Mixtures with Semi-Supervised NMF
Separating the singing voice from a musical mixture is a problem widely addressed due to its various applications. However, most approaches do not tackle the separation of unvoiced consonant sounds, which causes a loss of quality in any vocal source separation algorithm, and is especially noticeable for unvoiced fricatives (e.g. /T/ in thing) due to their energy level and duration. Fricatives are consonants produced by forcing air through a narrow channel made by placing two articulators close together. We propose a method to model and separate unvoiced fricative consonants based on a semisupervised Non-negative Matrix Factorization, in which a set of spectral basis components are learnt from a training excerpt. We implemented this method as an extension of an existing well-known factorization approach for singing voice (SIMM). An objective evaluation shows a small improvement in the separation results. Informal listening tests show a significant increase of intelligibility in the isolated vocals.
Download Study of Regularizations and Constraints in NMF-Based Drums Monaural Separation
Drums modelling is of special interest in musical source separation because of its widespread presence in western popular music. Current research has often focused on drums separation without specifically modelling the other sources present in the signal. This paper presents an extensive study of the use of regularizations and constraints to drive the factorization towards the separation between percussive and non-percussive music accompaniment. The proposed regularizations control the frequency smoothness of the basis components and the temporal sparseness of the gains. We also evaluated the use of temporal constraints on the gains to perform the separation, using both ground truth manual annotations (made publicly available) and automatically extracted transients. Objective evaluation of the results shows that, while optimal regularizations are highly dependent on the signal, drum event position contains enough information to achieve a high quality separation.
Download Low-Latency Bass Separation Using Harmonic-Percussion Decomposition
Many recent approaches to musical source separation rely on modelbased inference methods that take into account the signal’s harmonic structure. To address the particular case of low-latency bass separation, we propose a method that combines harmonic decomposition using a Tikhonov regularization-based algorithm, with the peak contrast analysis of the pitch likelihood function. Our experiment compares the separation performance of this method to a naive low-pass filter, a state-of-the-art NMF-based method and a near-optimal binary mask. The proposed low-latency method achieves results similar to the NMF-based high-latency approach at a lower computational cost. Therefore the method is valid for real-time implementations.
Download Modelling and Separation of Singing Voice Breathiness in Polyphonic Mixtures
Most current source separation methods only target the voiced component of the singing voice. Besides the unvoiced consonant phonemes, the remaining breathiness is very noticeable to humans and it retains much of the phonetic and timbral information from the singer. We propose a low-latency method for estimating the spectrum of the breathiness component, which is taken into account when isolating the singing voice source from the mixture. The breathiness component is derived from the detected harmonic envelope in pitched vocal sounds. The separation of the voiced components is used in conjunction with an existing iterative approach based on spectrum factorization. Finally, we conduct an objective evaluation that demonstrates the separation improvement, supported also by a number of audio examples.