Download A Complex Wavelet Based Fundamental Frequency Estimator in Single-Channel Polyphonic Signals In this work, a new estimator of the fundamental frequencies (F0 ) present in a polyphonic single-channel signal is developed. The signal is modeled in terms of a set of discrete partials obtained by the Complex Continuous Wavelet Transform (CCWT). The fundamental frequency estimation is based on the energy distribution of the detected partials of the input signal followed by an spectral smoothness technique. The proposed algorithm is designed to work with suppressed fundamentals, inharmonic partials and harmonic related sounds. The detailed technique has been tested over a set of input signals including polyphony 2 to 6, with high precision results that show the strength of the algorithm. The obtained results are very promising in order to include the developed algorithm as the basis of Blind Sound Source Separation or automatic score transcription techniques.
Download Maximum Filter Vibrato Suppression for Onset Detection We present SuperFlux - a new onset detection algorithm with vibrato suppression. It is an enhanced version of the universal spectral flux onset detection algorithm, and reduces the number of false positive detections considerably by tracking spectral trajectories with a maximum filter. Especially for music with heavy use of vibrato (e.g., sung operas or string performances), the number of false positive detections can be reduced by up to 60% without missing any additional events. Algorithm performance was evaluated and compared to state-of-the-art methods on the basis of three different datasets comprising mixed audio material (25,927 onsets), violin recordings (7,677 onsets) and operatic solo voice recordings (1,448 onsets). Due to its causal nature, the algorithm is applicable in both offline and online real-time scenarios.
Download Generating Musical Accompaniment Using Finite State Transducers The finite state transducer (FST), a type of finite state machine that maps an input string to an output string, is a common tool in the fields of natural language processing and speech recognition. FSTs have also been applied to music-related tasks such as audio fingerprinting and the generation of musical accompaniment. In this paper, we describe a system that uses an FST to generate harmonic accompaniment to a melody. We provide details of the methods employed to quantize a music signal, the topology of the transducer, and discuss our approach to evaluating the system. We argue for an evaluation metric that takes into account the quality of the generated accompaniment, rather than one that returns a binary value indicating the correctness or incorrectness of the accompaniment.
Download Re-Thinking Sound Separation: Prior Information and Additivity Constraint in Separation Algorithms In this paper, we study the effect of prior information on the quality of informed source separation algorithms. We present results with our system for solo and accompaniment separation and contrast our findings with two other state-of-the art approaches. Results suggest current separation techniques limit performance when compared to extraction process of prior information. Furthermore, we present an alternative view of the separation process where the additivity constraint of the algorithm is removed in the attempt to maximize obtained quality. Plausible future directions in sound separation research are discussed.
Download Study of Regularizations and Constraints in NMF-Based Drums Monaural Separation Drums modelling is of special interest in musical source separation because of its widespread presence in western popular music. Current research has often focused on drums separation without specifically modelling the other sources present in the signal. This paper presents an extensive study of the use of regularizations and constraints to drive the factorization towards the separation between percussive and non-percussive music accompaniment. The proposed regularizations control the frequency smoothness of the basis components and the temporal sparseness of the gains. We also evaluated the use of temporal constraints on the gains to perform the separation, using both ground truth manual annotations (made publicly available) and automatically extracted transients. Objective evaluation of the results shows that, while optimal regularizations are highly dependent on the signal, drum event position contains enough information to achieve a high quality separation.
Download Reverse Engineering Stereo Music Recordings Pursuing an Informed Two-Stage Approach A cascade reverse engineering approach is presented which uses an explicit model of the music production chain. The model considers both the mixing and the mastering stages and incorporates a parametric signal model. The approach is further pursued in an informed scenario. This means that the model parameters are attached in the form of auxiliary data to the mastered mix. They are resorted to afterwards in order to undo the mastering and the mixing. The validity of the approach is demonstrated on a stereo mixture.
Download Source Separation and Analysis of Piano Music Signals Using Instrument-Specific Sinusoidal Model Many existing monaural source separation systems use sinusoidal modeling to represent pitched musical sounds during the separation process. In these sinusoidal modeling systems, a musical sound is represented by a sum of time-varying sinusoidal components, and the goal of source separation is to estimate the parameters of each component. Here, we propose an instrument-specific sinusoidal model tailored for a piano tone. Based on our proposed Piano Model, we develop a monaural source separation system to extract each individual tone from mixture signals of piano tones and at the same time, to identify the intensity and adjust the onset of each tone for characterizing the nuance of the music performance. The major difficulty of the source separation problem is to resolve overlapping partials. Our solution collects the training data from isolated tones to train our Piano Model which can capture the common properties across the reappearance of pitches that helps to separate the mixtures. This approach enables high separation quality even for the case of octaves in which the partials of the upper tone completely overlap with those of the lower tone. The results show that our proposed system gives robust and accurate separation of piano tone signal mixtures (including octaves), with the quality significantly better than those reported in the previous work.
Download Low-Latency Bass Separation Using Harmonic-Percussion Decomposition Many recent approaches to musical source separation rely on modelbased inference methods that take into account the signal’s harmonic structure. To address the particular case of low-latency bass separation, we propose a method that combines harmonic decomposition using a Tikhonov regularization-based algorithm, with the peak contrast analysis of the pitch likelihood function. Our experiment compares the separation performance of this method to a naive low-pass filter, a state-of-the-art NMF-based method and a near-optimal binary mask. The proposed low-latency method achieves results similar to the NMF-based high-latency approach at a lower computational cost. Therefore the method is valid for real-time implementations.
Download A 3D Multi-Plate Environment for Sound Synthesis In this paper, a physics-based sound synthesis environment is presented which is composed of several plates, under nonlinear conditions, coupled with the surrounding acoustic field. Equations governing the behaviour of the system are implemented numerically using finite difference time domain methods. The number of plates, their position relative to a 3D computational enclosure and their physical properties can all be specified by the user; simple control parameters allow the musician/composer to play the virtual instrument. Spatialised sound outputs may be sampled from the simulated acoustic field using several channels simultaneously. Implementation details and control strategies for this instrument will be discussed; simulations results and sound examples will be presented.
Download Pure Data External for Reactive HMM-Based Speech and Singing Synthesis In this paper, we present the recent progress in the M AGE project. M AGE is a library for reactive HMM-based speech and singing synthesis. Here, it is integrated as a Pure Data external, called mage~, which provides reactive voice quality, prosody and identity manipulation combined with contextual control. mage~ brings together the high-quality, natural and expressive speech of HMMbased speech synthesis with high flexibility and reactive control over the speech production level. Such an object provides a basis for further research in gesturally-controlled speech synthesis. It is an object that can “listen” and reactively adjust itself to its environment. Further in this work, based on mage~ we create different interfaces and controllers in order to explore the realtime, expressive and interactive nature of speech.