Download Effective Singing Voice Detection in Popular Music Using ARMA Filtering
Locating singing voice segments is essential for convenient indexing, browsing and retrieval large music archives and catalogues. Furthermore, it is beneficial for automatic music transcription and annotations. The approach described in this paper uses Mel-Frequency Cepstral Coefficients in conjunction with Gaussian Mixture Models for discriminating two classes of data (instrumental music and singing voice with music background). Due to imperfect classification behavior, the categorization without additional post-processing tends to alternate within a very short time span, whereas singing voice tends to be continuous for several frames. Thus, various tests have been performed to identify a suitable decision function and corresponding smoothing methods. Results are reported by comparing the performance of straightforward likelihood based classifications vs. postprocessing with an autoregressive moving average filtering method.
Download Exploring Phase Information in Sound Source Separation Applications
Separation of instrument sounds from polyphonic music recordings is a desirable signal processing function with a wide variety of applications in music production, video games and information retrieval. In general, sound source separation algorithms attempt to exploit those characteristics of audio signals that differentiate one from the other. Many algorithms have studied spectral magnitude as a means for separation tasks. Here we propose the exploration of phase information of musical instrument signals as an alternative dimension in discriminating sound signals originating from different sources. Three cases are presented: (1) Phase contours of musical instruments notes as potential separation features. (2) Resolving overlapping harmonics using phase coupling properties of musical instruments. (3) Harmonic percussive decomposition using calculated radian ranges for each frequency bin.
Download Re-Thinking Sound Separation: Prior Information and Additivity Constraint in Separation Algorithms
In this paper, we study the effect of prior information on the quality of informed source separation algorithms. We present results with our system for solo and accompaniment separation and contrast our findings with two other state-of-the art approaches. Results suggest current separation techniques limit performance when compared to extraction process of prior information. Furthermore, we present an alternative view of the separation process where the additivity constraint of the algorithm is removed in the attempt to maximize obtained quality. Plausible future directions in sound separation research are discussed.
Download Parametric Audio Coding of Bass Guitar Recordings Using a Tuned Physical Modeling Algorithm
In this paper, we propose a parametric audio coding framework that combines the analysis and re-synthesis of electric bass guitar recordings. In particular, an existing synthesis algorithm that incorporates 11 playing techniques is extended by two calibration algorithms. Both the temporal and spectral decay parameters as well as the inharmonicity coefficient are set according to the fretboard position on the instrument. Listening tests show that there is still a gap in perceptual quality between real-world instrument recordings and the re-synthesized versions. Due to this gap, the perceived improvement due to the model calibration is only small. Second, the listening tests reveal that the plucking styles are more important towards realistic synthesis results than expression styles.
Download Real-Time Transcription and Separation of Drum Recordings Based on NMF Decompositon
This paper proposes a real-time capable method for transcribing and separating occurrences of single drum instruments in polyphonic drum recordings. Both the detection and the decomposition are based on Non-Negative Matrix Factorization and can be implemented with very small systemic delay. We propose a simple modification to the update rules that allows to capture timedynamic spectral characteristics of the involved drum sounds. The method can be applied in music production and music education software. Performance results with respect to drum transcription are presented and discussed. The evaluation data-set consisting of annotated drum recordings is published for use in further studies in the field. Index Terms - drum transcription, source separation, nonnegative matrix factorization, spectral processing, audio plug-in, music production, music education
Download Automatic Tablature Transcription of Electric Guitar Recordings by Estimation of Score- and Instrument-Related Parameters
In this paper we present a novel algorithm for automatic analysis, transcription, and parameter extraction from isolated polyphonic guitar recordings. In addition to general score-related information such as note onset, duration, and pitch, instrumentspecific information such as the plucked string, the applied plucking and expression styles are retrieved automatically. For this purpose, we adapted several state-of-the-art approaches for onset and offset detection, multipitch estimation, string estimation, feature extraction, and multi-class classification. Furthermore we investigated a robust partial tracking algorithm with respect to inharmonicity, an extensive extraction of novel and known audio features as well as the exploitation of instrument-based knowledge in the form of plausability filtering to obtain more reliable prediction. Our system achieved very high accuracy values of 98 % for onset and offset detection as well as multipitch estimation. For the instrument-related parameters, the proposed algorithm also showed very good performance with accuracy values of 82 % for the string number, 93 % for the plucking style, and 83 % for the expression style. Index Terms - playing techniques, plucking style, expression style, multiple fundamental frequency estimation, string classification, fretboard position, fingering, electric guitar, inharmonicity coefficient, tablature
Download Towards Transient Restoration in Score-informed Audio Decomposition
Our goal is to improve the perceptual quality of transient signal components extracted in the context of music source separation. Many state-of-the-art techniques are based on applying a suitable decomposition to the magnitude of the Short-Time Fourier Transform (STFT) of the mixture signal. The phase information required for the reconstruction of individual component signals is usually taken from the mixture, resulting in a complex-valued, modified STFT (MSTFT). There are different methods for reconstructing a time-domain signal whose STFT approximates the target MSTFT. Due to phase inconsistencies, these reconstructed signals are likely to contain artifacts such as pre-echos preceding transient components. In this paper, we propose a simple, yet effective extension of the iterative signal reconstruction procedure by Griffin and Lim to remedy this problem. In a first experiment, under laboratory conditions, we show that our method considerably attenuates pre-echos while still showing similar convergence properties as the original approach. A second, more realistic experiment involving score-informed audio decomposition shows that the proposed method still yields improvements, although to a lesser extent, under non-idealized conditions.
Download NMF Toolbox: Music Processing Applications of Nonnegative Matrix Factorization
Nonnegative matrix factorization (NMF) is a family of methods widely used for information retrieval across domains including text, images, and audio. Within music processing, NMF has been used for tasks such as transcription, source separation, and structure analysis. Prior work has shown that initialization and constrained update rules can drastically improve the chances of NMF converging to a musically meaningful solution. Along these lines we present the NMF toolbox, containing MATLAB and Python implementations of conceptually distinct NMF variants—in particular, this paper gives an overview for two algorithms. The first variant, called nonnegative matrix factor deconvolution (NMFD), extends the original NMF algorithm to the convolutive case, enforcing the temporal order of spectral templates. The second variant, called diagonal NMF, supports the development of sparse diagonal structures in the activation matrix. Our toolbox contains several demo applications and code examples to illustrate its potential and functionality. By providing MATLAB and Python code on a documentation website under a GNU-GPL license, as well as including illustrative examples, our aim is to foster research and education in the field of music processing.