Download Physics-Based and Spike-Guided Tools for Sound Design In this paper we present graphical tools and parameters search algorithms for the timbre space exploration and design of complex sounds generated by physical modeling synthesis. The tools are built around a sparse representation of sounds based on Gammatone functions and provide the designer with both a graphical and an auditory insight. The auditory representation of a number of reference sounds, located as landmarks in a 2D sound design space, provides the designer with an effective aid to direct his search for new sounds. The sonic landmarks can either be synthetic sounds chosen by the user or be automatically derived by using clever parameter search and clustering algorithms. The proposed probabilistic method in this paper makes use of the sparse representations to model the distance between sparsely represented sounds. A subsequent optimization model minimizes those distances to estimate the optimal parameters, which generate the landmark sounds on the given auditory landscape.
Download Template-Based Estimation of Tempo: Using Unsupervised or Supervised Learning to Create Better Spectral Templates In this paper, we study tempo estimation using spectral templates coming from unsupervised or supervised learning given a database annotated into tempo. More precisely, we study the inclusion of these templates in our tempo estimation algorithm of [1]. For this, we consider as periodicity observation a 48-dimensions vector obtained by sampling the value of the amplitude of the DFT at tempo-related frequencies. We name it spectral template. A set of reference spectral templates is then learned in an unsupervised or supervised way from an annotated database. These reference spectral templates combined with all the possible tempo assumptions constitute the hidden states which we decode using a Viterbi algorithm. Experiments are then performed on the “ballroom dancer” test-set which allows concluding on improvement over state-ofthe-art. In particular, we discuss the use of prior tempo probabilities. It should be noted however that these results are only indicative considering that the training and test-set are the same in this preliminary experiment.
Download Automatic Detection of Multiple, Cascaded Audio Effects in Guitar Recordings This paper presents a method to detect and distinguish single and multiple audio effects in monophonic electric guitar recordings. It is based on spectral analysis of audio segments located in the sustain part of guitar tones. Overall, 541 spectral, cepstral and harmonic features are extracted from short time spectra of the audio segments. Support Vector Machines are used in combination with feature selection and transform techniques for automatic classification based on the extracted feature vectors. A novel database that consists of approx. 50000 guitar tones was assembled for the purpose of evaluation. Classification accuracy reached 99.2% for the detection and distinction of arbitrary combinations of six frequently used audio effects.
Download A Segmental Spectro-Temporal Model of Musical Timbre We propose a new statistical model of musical timbre that handles the different segments of the temporal envelope (attack, sustain and release) separately in order to account for their different spectral and temporal behaviors. The model is based on a reduced-dimensionality representation of the spectro-temporal envelope. Temporal coefficients corresponding to the attack and release segments are subjected to explicit trajectory modeling based on a non-stationary Gaussian Process. Coefficients corresponding to the sustain phase are modeled as a multivariate Gaussian. A compound similarity measure associated with the segmental model is proposed and successfully tested in instrument classification experiments. Apart from its use in a statistical framework, the modeling method allows intuitive and informative visualizations of the characteristics of musical timbre.
Download The Restoration of Single Channel Audio Recordings Based on Non-Negative Matrix Factorization and Perceptual Suppression Rule In this paper, we focus on the signal-to-noise ratio (SNR) improvement in single channel audio recordings. Many approaches have been reported in the literature. The most popular method, with many variants, is Short Time Spectral Attenuation (STSA). Although this method reduces the noise and improves the SNR, it mostly tends to introduce signal distortion and a perceptually annoying residual noise usually called musical noise. In this paper we investigate the use of Non-negative Matrix Factorization (NMF) as an alternative to the STSA for the digital curation of musical heritage. NMF is an emerging new technique in the blind extraction of signals recorded in a variety of different fields. The application of NMF to the analysis of monaural recordings is relatively recent. We show that NMF is a suitable technique to extract the clean audio signal from undesired non stationary noise in a monaural recording of ethnic music. More specifically, we introduce a perceptual suppression rule to determine how the perceptual domain is competitive compared to the acoustic domain. Moreover, we carry out a listening test in order to compare NMF with the state of the art audio restoration framework using the EBU MUSHRA test method. The encouraging results obtained with this methodology in the presented case study support their wider applicability in audio separation.
Download Fusing Block-level Features for Music Similarity Estimation In this paper we present a novel approach to computing music similarity based on block-level features. We first introduce three novel block-level features — the Variance Delta Spectral Pattern (VDSP), the Correlation Pattern (CP) and the Spectral Contrast Pattern (SCP). Then we describe how to combine the extracted features into a single similarity function. A comprehensive evaluation based on genre classification experiments shows that the combined block-level similarity measure (BLS) is comparable, in terms of quality, to the best current method from the literature. But BLS has the important advantage of being based on a vector space representation, which directly facilitates a number of useful operations, such as PCA analysis, k-means clustering, visualization etc. We also show that there is still potential for further improve of music similarity measures by combining BLS with another stateof-the-art algorithm; the combined algorithm then outperforms all other algorithms in our evaluation. Additionally, we discuss the problem of album and artist effects in the context of similaritybased recommendation and show that one can detect the presence of such effects in a given dataset by analyzing the nearest neighbor classification results.
Download Physically Based Sound Synthesis and Control of Footsteps Sounds We describe a system to synthesize in real-time footsteps sounds. The sound engine is based on physical models and physically inspired models reproducing the act of walking on several surfaces. To control the real-time engine, three solutions are proposed. The first two solutions are based on floor microphones, while the third one is based on shoes enhanced with sensors. The different solutions proposed are discussed in the paper.