Download On the evaluation of perceptual similarity measures for music
Several applications in the field of content-based interaction with music repositories rely on measures which estimate the perceived similarity of music. These applications include automatic genre recognition, playlist generation, and recommender systems. In this paper we study methods to evaluate the performance of such measures. We compare five measures which use only the information extracted from the audio signal and discuss how these measures can be evaluated qualitatively and quantitatively without resorting to large scale listening tests.
Download An Open Source Tool for Semi-Automatic Rhythmic Annotation
We present a plugin implementation for the multi-platform WaveSurfer sound editor. Added functionalities are the semi-automatic extraction of beats at diverse levels of the metrical hierarchy as well as uploading and downloading functionalities to a music metadata database. It is built upon existing open source (GPL-licenced) audio processing tools, namely WaveSurfer, BeatRoot and CLAM, in the intent to expand the scope of those softwares. It is therefore also provided as GPL code with the explicit goal that researchers in the audio processing community can freely use and improve it. We provide technical details of the implementation as well as practical use cases. We also motivate the use of rhythmic metadata in Music Information Retrieval scenarios.
Download Live Tracking of Musical Performances using On-Line Time Warping
Dynamic time warping finds the optimal alignment of two time series, but it is not suitable for on-line applications because it requires complete knowledge of both series before the alignment can be computed. Further, the quadratic time and space requirements are limiting factors even for off-line systems. We present a novel on-line time warping algorithm which has linear time and space costs, and performs incremental alignment of two series as one is received in real time. This algorithm is applied to the alignment of audio signals in order to follow musical performances of arbitrary length. Each frame of audio is represented by a positive spectral difference vector, emphasising note onsets. The system was tested on various test sets, including recordings of 22 pianists playing music by Chopin, where the average alignment error was 59ms (median 20ms). We demonstrate one application of the system: the analysis and visualisation of musical expression in real time.
Download Onset Detection Revisited
Various methods have been proposed for detecting the onset times of musical notes in audio signals. We examine recent work on onset detection using spectral features such as the magnitude, phase and complex domain representations, and propose improvements to these methods: a weighted phase deviation function and a halfwave rectified complex difference. These new algorithms are compared with several state-of-the-art algorithms from the literature, and these are tested using a standard data set of short excerpts from a range of instruments (1060 onsets), plus a much larger data set of piano music (106054 onsets). Some of the results contradict previously published results and suggest that a similarly high level of performance can be obtained with a magnitude-based (spectral flux), a phase-based (weighted phase deviation) or a complex domain (complex difference) onset detection function.
Download The Wablet: Scanned Synthesis on a Multi-Touch Interface
This paper presents research into scanned synthesis on a multitouch screen device. This synthesis technique involves scanning a wavetable that is dynamically evolving in the manner of a massspring network. It is argued that scanned synthesis can provide a good solution to some of the issues in digital musical instrument design, and is particularly well suited to multi-touch screens. In this implementation, vibrating mass-spring networks with a variety of configurations can be created. These can be manipulated by touching, dragging and altering the orientation of the tablet. Arbitrary scanning paths can be drawn onto the structure. Several extensions to the original scanned synthesis technique are proposed, most important of which for multi-touch implementations is the freedom of the masses to move in two dimensions. An analysis of the scanned output in the case of a 1D ideal string model is given, and scanned synthesis is also discussed as being a generalisation of a number of other synthesis methods.
Download Characterisation of Acoustic Scenes Using a Temporally-constrained Shift-invariant Model
In this paper, we propose a method for modeling and classifying acoustic scenes using temporally-constrained shift-invariant probabilistic latent component analysis (SIPLCA). SIPLCA can be used for extracting time-frequency patches from spectrograms in an unsupervised manner. Component-wise hidden Markov models are incorporated to the SIPLCA formulation for enforcing temporal constraints on the activation of each acoustic component. The time-frequency patches are converted to cepstral coefficients in order to provide a compact representation of acoustic events within a scene. Experiments are made using a corpus of train station recordings, classified into 6 scene classes. Results show that the proposed model is able to model salient events within a scene and outperforms the non-negative matrix factorization algorithm for the same task. In addition, it is demonstrated that the use of temporal constraints can lead to improved performance.
Download A Comparison of Extended Source-Filter Models for Musical Signal Reconstruction
Recently, we have witnessed an increasing use of the sourcefilter model in music analysis, which is achieved by integrating the source filter model into a non-negative matrix factorisation (NMF) framework or statistical models. The combination of the source-filter model and NMF framework reduces the number of free parameters needed and makes the model more flexible to extend. This paper compares four extended source-filter models: the source-filter-decay (SFD) model, the NMF with timefrequency activations (NMF-ARMA) model, the multi-excitation (ME) model and the source-filter model based on β-divergence (SFbeta model). The first two models represent the time-varying spectra by adding a loss filter and a time-varying filter, respectively. The latter two are extended by using multiple excitations and including a scale factor, respectively. The models are tested using sounds of 15 instruments from the RWC Music Database. Performance is evaluated based on the relative reconstruction error. The results show that the NMF-ARMA model outperforms other models, but uses the largest set of parameters.
Download Digitally Moving An Electric Guitar Pickup
This paper describes a technique to transform the sound of an arbitrarily selected magnetic pickup into another pickup selection on the same electric guitar. This is a first step towards replicating an arbitrary electric guitar timbre in an audio recording using the signal from another guitar as input. We record 1458 individual notes from the pickups of a single guitar, varying the string, fret, plucking position, and dynamics of the tones in order to create a controlled dataset for training and testing our approach. Given an input signal and a target signal, a least squares estimator is used to obtain the coefficients of a finite impulse response (FIR) filter to match the desired magnetic pickup position. We use spectral difference to measure the error of the emulation, and test the effects of independent variables fret, dynamics, plucking position and repetition on the accuracy. A small reduction in accuracy was observed for different repetitions; moderate errors arose when the playing style (plucking position and dynamics) were varied; and there were large differences between output and target when the training and test data comprised different notes (fret positions). We explain results in terms of the acoustics of the vibrating strings.
Download Estimating Pickup and Plucking Positions of Guitar Tones and Chords with Audio Effects
In this paper, we introduce an approach to estimate the pickup position and plucking point on an electric guitar for both single notes and chords recorded through an effects chain. We evaluate the accuracy of the method on direct input signals along with 7 different combinations of guitar amplifier, effects, loudspeaker cabinet and microphone. The autocorrelation of the spectral peaks of the electric guitar signal is calculated and the two minima that correspond to the locations of the pickup and plucking event are detected. In order to model the frequency response of the effects chain, we flatten the spectrum using polynomial regression. The errors decrease after applying the spectral flattening method. The median absolute error for each preset ranges from 2.10 mm to 7.26 mm for pickup position and 2.91 mm to 21.72 mm for plucking position estimates. For strummed chords, faster strums are more difficult to estimate but still yield accurate results, where the median absolute errors for pickup position estimates are less than 10 mm.