Download Hidden Markov Models for spectral similarity of songs
Hidden Markov Models (HMM) are compared to Gaussian Mixture Models (GMM) for describing spectral similarity of songs. Contrary to previous work we make a direct comparison based on the log-likelihood of songs given an HMM or GMM. Whereas the direct comparison of log-likelihoods clearly favors HMMs, this advantage in terms of modeling power does not allow for any gain in genre classification accuracy.
Download A Matlab Toolbox for Musical Feature Extraction from Audio
We present MIRtoolbox, an integrated set of functions written in Matlab, dedicated to the extraction of musical features from audio files. The design is based on a modular framework: the different algorithms are decomposed into stages, formalized using a minimal set of elementary mechanisms, and integrating different variants proposed by alternative approaches – including new strategies we have developed –, that users can select and parametrize. This paper offers an overview of the set of features, related, among others, to timbre, tonality, rhythm or form, that can be extracted with MIRtoolbox. Four particular analyses are provided as examples. The toolbox also includes functions for statistical analysis, segmentation and clustering. Particular attention has been paid to the design of a syntax that offers both simplicity of use and transparent adaptiveness to a multiplicity of possible input types. Each feature extraction method can accept as argument an audio file, or any preliminary result from intermediary stages of the chain of operations. Also the same syntax can be used for analyses of single audio files, batches of files, series of audio segments, multichannel signals, etc. For that purpose, the data and methods of the toolbox are organised in an object-oriented architecture.
Download Source-Filter based Clustering for Monaural Blind Source Separation
In monaural blind audio source separation scenarios, a signal mixture is usually separated into more signals than active sources. Therefore it is necessary to group the separated signals to the final source estimations. Traditionally grouping methods are supervised and thus need a learning step on appropriate training data. In contrast, we discuss unsupervised clustering of the separated channels by Mel frequency cepstrum coefficients (MFCC). We show that replacing the decorrelation step of the MFCC by the non-negative matrix factorization improves the separation quality significantly. The algorithms have been evaluated on a large test set consisting of melodies played with different instruments, vocals, speech, and noise.
Download Time-Dependent Parametric and Harmonic Templates in Non-Negative Matrix Factorization
This paper presents a new method to decompose musical spectrograms derived from Non-negative Matrix Factorization (NMF). This method uses time-varying harmonic templates (atoms) which are parametric: these atoms correspond to musical notes. Templates are synthesized from the values of the parameters which are learnt in an NMF framework. This parameterization permits to accurately model some musical effects (such as vibrato) which are inaccurately modeled by NMF.
Download Onset Time Estimation for the Analysis of Percussive Sounds using Exponentially Damped Sinusoids
Exponentially damped sinusoids (EDS) model-based analysis of sound signals often requires a precise estimation of initial amplitudes and phases of the components found in the sound, on top of a good estimation of their frequencies and damping. This can be of the utmost importance in many applications such as high-quality re-synthesis or identification of structural properties of sound generators (e.g. a physical coupling of vibrating devices). Therefore, in those specific applications, an accurate estimation of the onset time is required. In this paper we present a two-step onset time estimation procedure designed for that purpose. It consists of a “rough" estimation using an STFT-based method followed by a time-domain method to “refine" the previous results. Tests carried out on synthetic signals show that it is possible to estimate onset times with errors as small as 0.2ms. These tests also confirm that operating first in the frequency domain and then in the time domain allows to reach a better resolution vs. speed compromise than using only one frequency-based or one time-based onset detection method. Finally, experiments on real sounds (plucked strings and actual percussions) illustrate how well this method performs in more realistic situations.
Download Modifying Signals in Transform Domain: a Frame-Based Inverse Problem
Within this paper a method for morphing audio signals is presented. The theory is based on general frames and the modification of the signals is done via frame multiplier. Searching this frame multiplier with given input and output signal, an inverse problem occurs and a priori information is added with regularization terms. A closed-form solution is obtained by a diagonal approximation, i.e. using only the diagonal entries in the signal transformations. The proposed solutions for different regularization terms are applied to Gabor frames and to the constant-Q transform, based on non-stationary Gabor frames.
Download Unsupervised Taxonomy of Sound Effects
Sound effect libraries are commonly used by sound designers in a range of industries. Taxonomies exist for the classification of sounds into groups based on subjective similarity, sound source or common environmental context. However, these taxonomies are not standardised, and no taxonomy based purely on the sonic properties of audio exists. We present a method using feature selection, unsupervised learning and hierarchical clustering to develop an unsupervised taxonomy of sound effects based entirely on the sonic properties of the audio within a sound effect library. The unsupervised taxonomy is then related back to the perceived meaning of the relevant audio features.
Download The Mix Evaluation Dataset
Research on perception of music production practices is mainly concerned with the emulation of sound engineering tasks through lab-based experiments and custom software, sometimes with unskilled subjects. This can improve the level of control, but the validity, transferability, and relevance of the results may suffer from this artificial context. This paper presents a dataset consisting of mixes gathered in a real-life, ecologically valid setting, and perceptual evaluation thereof, which can be used to expand knowledge on the mixing process. With 180 mixes including parameter settings, close to 5000 preference ratings and free-form descriptions, and a diverse range of contributors from five different countries, the data offers many opportunities for music production analysis, some of which are explored here. In particular, more experienced subjects were found to be more negative and more specific in their assessments of mixes, and to increasingly agree with each other.
Download Analysis of Musical Dynamics in Vocal Performances Using Loudness Measures
In addition to tone, pitch and rhythm, dynamics is one of the expressive dimensions of the performance of a music piece that has received limited attention. While the usage of dynamics may vary from artist to artist, and also from performance to performance, a systematic methodology to automatically identify the dynamics of a performance in terms of musically meaningful terms like forte, piano may offer valuable feedback in the context of music education and in particular in singing. To this end, we have manually annotated the dynamic markings of commercial recordings of popular rock and pop songs from the Smule Vocal Balanced (SVB) dataset which will be used as reference data. Then as a first step for our research goal, we propose a method to derive and compare singing voice loudness curves in polyphonic mixtures. Towards measuring the similarity and variation of dynamics, we compare the dynamics curves of the SVB renditions with the one derived from the original songs. We perform the same comparison using professionally produced renditions from a karaoke website. We relate high values of Spearman correlation coefficient found in some select student renditions and the professional renditions with accurate dynamics.
Download Power-Balanced Dynamic Modeling of Vactrols: Application to a VTL5C3/2
Vactrols, which consist of a photoresistor and a light-emitting element that are optically coupled, are key components in optical dynamic compressors. Indeed, the photoresistor’s program-dependent dynamic characteristics make it advantageous for automatic gain control in audio applications. Vactrols are becoming more and more difficult to find, while the interest for optical compression in the audio community does not diminish. They are thus good candidates for virtual analog modeling. In this paper, a model of vactrols that is entirely physical, passive, with a program-dependent dynamic behavior, is proposed. The model is based on first principles that govern semi-conductors, as well as the port-Hamiltonian systems formalism, which allows the modeling of nonlinear, multiphysical behaviors. The proposed model is identified with a real vactrol, then connected to other components in order to simulate a simple optical compressor.