Download Chorus Detection with Combined Use of MFCC and Chroma Features and Image Processing Filters A computationally efficient method for detecting a chorus section in popular and rock music is presented. The method utilizes a distance matrix representation that is obtained by summing two separate distance matrices calculated using the mel-frequency cepstral coefficient and pitch chroma features. The benefit of computing two separate distance matrices is that different enhancement operations can be applied on each. An enhancement operation is found beneficial only for the chroma distance matrix. This is followed by detection of the off-diagonal segments of small distance from the distance matrix. From the detected segments, an initial chorus section is selected using a scoring mechanism utilizing several heuristics, and subjected to further processing. This further processing involves using image processing filters in a neighborhood of the distance matrix surrounding the initial chorus section. The final position and length of the chorus is selected based on the filtering results. On a database of 206 popular & rock music pieces an average F-measure of 86% is obtained. It takes about ten seconds to process a song with an average duration of three to four minutes on a Windows XP computer with a 2.8 GHz Intel Xeon processor.
Download A Matlab Toolbox for Musical Feature Extraction from Audio We present MIRtoolbox, an integrated set of functions written in Matlab, dedicated to the extraction of musical features from audio files. The design is based on a modular framework: the different algorithms are decomposed into stages, formalized using a minimal set of elementary mechanisms, and integrating different variants proposed by alternative approaches – including new strategies we have developed –, that users can select and parametrize. This paper offers an overview of the set of features, related, among others, to timbre, tonality, rhythm or form, that can be extracted with MIRtoolbox. Four particular analyses are provided as examples. The toolbox also includes functions for statistical analysis, segmentation and clustering. Particular attention has been paid to the design of a syntax that offers both simplicity of use and transparent adaptiveness to a multiplicity of possible input types. Each feature extraction method can accept as argument an audio file, or any preliminary result from intermediary stages of the chain of operations. Also the same syntax can be used for analyses of single audio files, batches of files, series of audio segments, multichannel signals, etc. For that purpose, the data and methods of the toolbox are organised in an object-oriented architecture.
Download Real-Time Visualisation of Loudness Along Different Time Scales We propose a set of design criteria for visualising loudness features of an audio signal, measured along different time scales. A novel real-time loudness meter, based on these criteria, is presented. The meter simultaneously shows short-term loudness, long-term loudness and peak level. The short-term loudness is displayed using a circular bar graph. The meter displays the longterm loudness by means of a circular envelope graph, organized according to an absolute time-scale – looking similar to a radar display. Typically, the loudness measured during the past hour is visible. The algorithms underlying the meter's loudness and peak level measurements take into account recent ITU-R recommendations and research into loudness modelling.
Download The Origins of DAFx and its Future within the Sound and Music Computing Field DAFX is an established conference that has become a reference gathering for the researchers working on audio signal processing. In this presentation I will go back ten years to the beginning of this conference and to the ideas that promoted it. Then I will jump to the present, to the current context of our research field, different from the one ten years ago, and I will make some personal reflections on the current situation and the challenges that we are encountering.
Download Modal Parameter Tracking for Shape-Changing Geometric Objects For interactive sound synthesis, we would like to change the shape of a finite element model of an instrument and rapidly hear how the sound changes. Using modal synthesis methods, we would need to compute a new modal decomposition with each change in the geometry, making the analysis too slow for interactive use. However, by using modes computed for one geometry to estimate the frequencies for nearby geometries, we can hear much more quickly how changing the instrument shape changes the sound. In this paper, we describe how to estimate resonant frequencies of an instrument by combining information about the modes of two similar instruments. We also describe the balance between computational speed and accuracy of the computed resonances.
Download Musical Signal Analysis Using Fractional-Delay Inverse Comb Filters A novel filter configuration for the analysis of harmonic musical signals is proposed. The method is based on inverse comb filtering that allows for the extraction of selected harmonic components or the background noise component between the harmonic spectral components. A highly accurate delay required in the inverse comb filter is implemented with a high-order allpass filter. The paper shows that the filter is easy to design, efficient to implement, and it enables accurate low-level feature analysis of musical tones. We describe several case studies to demonstrate the effectiveness of the proposed approach: isolating a single partial from a synthetic signal, analyzing the even-to-odd ratio of harmonics in a clarinet tone, and extracting the residual from a bowed string tone.
Download Real-Time and Efficient Algorithms for Frequency Warping Based on Local Approximations of Warping Operators Frequency warping is a modifier that acts on sound signals by remapping the frequency axis. Thus, the spectral content of the original sound is displaced to other frequencies. At the same time, the phase relationship among the signal components is altered, nonlinearly with respect to frequency. While this effect is interesting and has several applications, including in the synthesis by physical models, its use has been so far limited by the lack of an accurate and flexible real-time algorithm. In this paper we present methods for frequency warping that are based on local approximations of the warping operators and allow for real-time implementation. Filter bank structures are derived that allow for efficient realization of the approximate technique. An analysis of the error is also presented, which shows that both numerical and perceptual errors are within acceptable limits. Furthermore, the approximate implementation allows for a larger variety of warping maps than that achieved by the classical (non-causal) first-order allpass cascade implementation.
Download Short-Time Wavelet Analysis of Analytic Residuals for Real-Time Spectral Modelling This paper describes an approach to using compactly supported spline wavelets to model the residual signal in a real-time (frameby-frame) spectral modelling system. The outputs of the model are time-varying parameters (gain, centre frequency and bandwidth) for filters which can be used in a subtractive resynthesis system.
Download A Complex Envelope Sinusoidal Model for Audio Coding A modification to the hybrid sinusoidal model is proposed for the purpose of high-quality audio coding. In our proposal the amplitude envelope of each harmonic partial is modeled by a narrowband complex signal. Such representation incorporates most of the signal energy associated with sinusoidal components, including that related to frequency estimation and quantization errors. It also takes into account the natural width of each spectral line. The advantages of such model extension are a more straightforward and robust representation of the deterministic component and a clean stochastic residual without ghost sinusoids. The reconstructed signal is virtually free from harmonic artifacts and more natural sounding. We propose to encode the complex envelopes by the means of MCLT transform coefficients with coefficient interleave across partials within an MPEG-like coding scheme. We show some experimental results with high compression efficiency achieved.
Download Warped Linear Prediction for Improved Perceptual Quality in the SCELP Low Delay Audio Codec The SCELP (Spherical Code Excited Linear Prediction) audio codec, which has recently been proposed for low delay audio coding [5], is based on linear prediction (LP). It applies closed-loop vector quantization employing a spherical code which is based on the Apple Peeling code construction rule. Frequency warped signal processing is known to be beneficial especially in the context of wideband audio coding based on warped linear prediction (WLP). In this contribution, WLP is incorporated into the SCELP low delay audio codec. The overall audio quality of the resulting W-SCELP codec benefits from an improved perceptual masking of the quantization noise. Compared with existing standardized audio codecs with an algorithmic delay below 10 ms, the W-SCELP codec at a data rate of 48 kbit/sec outperforms the ITU-T G.722 codec at a data rate of 56 kbit/sec in terms of the achievable audio quality.