Download Transforming Singing Voice Expression - The Sweetness Effect
We propose a real-time system which is targeted to music production in the context of vocal recordings. The aim is to transform the singer’s voice characteristics in order to achieve a sweet sounding voice. It combines three different transformations namely SubHarmonic Component Reduction (reduction of sub-harmonics, which are found in voices with vocal disorders), Vocal Tract Excitation Modification (to achieve a change in loudness) and the Intonation Modification (to achieve smoother transitions in pitch). The transformations are done in the frequency domain based on an enhanced phase-locked vocoder. The Expression Adaptive Control estimates the amount of present vocal disorder in the singer’s voice. This estimate automatically controls the amount of SubHarmonic Component reduction to assure a natural sounding transformation.
Download Morphing techniques for enhanced scat singing
In jazz, scat singing is a phonetic improvisation that imitates instrumental sounds. In this paper, we propose a system that aims to transform singing voice into real instrument sounds, extending the possibilities for scat singers. Analysis algorithms in the spectral domain extract voice parameters, which drive the resulting instrument sound. A small database contains real instrument samples that have been spectrally analyzed offline. Two different prototypes are introduced, producing sounds of a trumpet and a bass guitar respectively.
Download Performance-Driven Control for Sample-Based Singing Voice Synthesis
In this paper we address the expressive control of singing voice synthesis. Singing Voice Synthesizers (SVS) traditionally require two types of inputs: a musical score and lyrics. The musical expression is then typically either generated automatically by applying a model of a certain type of expression to a high-level musical score, or achieved by manually editing low-level synthesizer parameters. We propose an alternative method, where the expression control is derived from a singing performance. In a first step, an analysis module extracts expressive information from the input voice signal, which is then adapted and mapped to the internal synthesizer controls. The presented implementation works in an off-line manner processing user input voice signals and lyrics using a phonetic segmentation module. The main contribution of this approach is to offer a direct way of controlling the expression of SVS. The further step is to run the system in real-time. The last section of this paper addresses a possible strategy for real-time operation.
Download Separation of Unvoiced Fricatives in Singing Voice Mixtures with Semi-Supervised NMF
Separating the singing voice from a musical mixture is a problem widely addressed due to its various applications. However, most approaches do not tackle the separation of unvoiced consonant sounds, which causes a loss of quality in any vocal source separation algorithm, and is especially noticeable for unvoiced fricatives (e.g. /T/ in thing) due to their energy level and duration. Fricatives are consonants produced by forcing air through a narrow channel made by placing two articulators close together. We propose a method to model and separate unvoiced fricative consonants based on a semisupervised Non-negative Matrix Factorization, in which a set of spectral basis components are learnt from a training excerpt. We implemented this method as an extension of an existing well-known factorization approach for singing voice (SIMM). An objective evaluation shows a small improvement in the separation results. Informal listening tests show a significant increase of intelligibility in the isolated vocals.
Download Study of Regularizations and Constraints in NMF-Based Drums Monaural Separation
Drums modelling is of special interest in musical source separation because of its widespread presence in western popular music. Current research has often focused on drums separation without specifically modelling the other sources present in the signal. This paper presents an extensive study of the use of regularizations and constraints to drive the factorization towards the separation between percussive and non-percussive music accompaniment. The proposed regularizations control the frequency smoothness of the basis components and the temporal sparseness of the gains. We also evaluated the use of temporal constraints on the gains to perform the separation, using both ground truth manual annotations (made publicly available) and automatically extracted transients. Objective evaluation of the results shows that, while optimal regularizations are highly dependent on the signal, drum event position contains enough information to achieve a high quality separation.
Download Low-Latency Bass Separation Using Harmonic-Percussion Decomposition
Many recent approaches to musical source separation rely on modelbased inference methods that take into account the signal’s harmonic structure. To address the particular case of low-latency bass separation, we propose a method that combines harmonic decomposition using a Tikhonov regularization-based algorithm, with the peak contrast analysis of the pitch likelihood function. Our experiment compares the separation performance of this method to a naive low-pass filter, a state-of-the-art NMF-based method and a near-optimal binary mask. The proposed low-latency method achieves results similar to the NMF-based high-latency approach at a lower computational cost. Therefore the method is valid for real-time implementations.
Download Modelling and Separation of Singing Voice Breathiness in Polyphonic Mixtures
Most current source separation methods only target the voiced component of the singing voice. Besides the unvoiced consonant phonemes, the remaining breathiness is very noticeable to humans and it retains much of the phonetic and timbral information from the singer. We propose a low-latency method for estimating the spectrum of the breathiness component, which is taken into account when isolating the singing voice source from the mixture. The breathiness component is derived from the detected harmonic envelope in pitched vocal sounds. The separation of the voiced components is used in conjunction with an existing iterative approach based on spectrum factorization. Finally, we conduct an objective evaluation that demonstrates the separation improvement, supported also by a number of audio examples.