Download Enhanced Beat Tracking with Context-Aware Neural Networks
We present two new beat tracking algorithms based on the autocorrelation analysis, which showed state-of-the-art performance in the MIREX 2010 beat tracking contest. Unlike the traditional approach of processing a list of onsets, we propose to use a bidirectional Long Short-Term Memory recurrent neural network to perform a frame by frame beat classification of the signal. As inputs to the network the spectral features of the audio signal and their relative differences are used. The network transforms the signal directly into a beat activation function. An autocorrelation function is then used to determine the predominant tempo to eliminate the erroneously detected - or complement the missing - beats. The first algorithm is tuned for music with constant tempo, whereas the second algorithm is further capable to follow changes in tempo and time signature.
Download Combining classifications based on local and global features: application to singer identification
In this paper we investigate the problem of singer identification on acapella recordings of isolated notes. Most of studies on singer identification describe the content of signals of singing voice with features related to the timbre (such as MFCC or LPC). These features aim to describe the behavior of frequencies at a given instant of time (local features). In this paper, we propose to describe sung tone with the temporal variations of the fundamental frequency (and its harmonics) of the note. The periodic and continuous variations of the frequency trajectories are analyzed on the whole note and the features obtained reflect expressive and intonative elements of singing such as vibrato, tremolo and portamento. The experiments, conducted on two distinct data-sets (lyric and pop-rock singers), prove that the new set of features capture a part of the singer identity. However, these features are less accurate than timbre-based features. We propose to increase the recognition rate of singer identification by combining information conveyed by local and global description of notes. The proposed method, that shows good results, can be adapted for classification problem involving a large number of classes, or to combine classifications with different levels of performance.
Download Production Effect: Audio Features for Recording Techniques Description and Decade Prediction
In this paper we address the problem of the description of music production techniques from the audio signal. Over the past decades sound engineering techniques have changed drastically. New recording technologies, extensive use of compressors and limiters or new stereo techniques have deeply modified the sound of records. We propose three features to describe these evolutions in music production. They are based on the dynamic range of the signal, energy difference between channels and phase spread between channels. We measure the relevance of these features on a task of automatic classification of Pop/Rock songs into decades. In the context of Music Information Retrieval this kind of description could be very useful to better describe the content of a song or to assess the similarity between songs.
Download GMM supervector for Content Based Music Similarity
Timbral modeling is fundamental in content based music similarity systems. It is usually achieved by modeling the short term features by a Gaussian Model (GM) or Gaussian Mixture Models (GMM). In this article we propose to achieve this goal by using the GMM-supervector approach. This method allows to represent complex statistical models by an Euclidean vector. Experiments performed for the music similarity task showed that this model outperform state of the art approches. Moreover, it reduces the similarity search time by a factor of ≈ 100 compared to state of the art GM modeling. Furthermore, we propose a new supervector normalization which makes the GMM-supervector approach more preformant for the music similarity task. The proposed normalization can be applied to other Euclidean models.
Download Implementing Real-Time Partitioned Convolution Algorithms on Conventional Operating Systems
We describe techniques for implementing real-time partitioned convolution algorithms on conventional operating systems using two different scheduling paradigms: time-distributed (cooperative) and multi-threaded (preemptive). We discuss the optimizations applied to both implementations and present measurements of their performance for a range of impulse response lengths on a recent high-end desktop machine. We find that while the time-distributed implementation is better suited for use as a plugin within a host audio application, the preemptive version was easier to implement and significantly outperforms the time-distributed version despite the overhead of frequent context switches.
Download Analysis and Trans-synthesis of Acoustic Bowed-String Instrument Recordings: a Case Study using Bach Cello Suites
In this paper, analysis and trans-synthesis of acoustic bowed string instrument recordings with new non-negative matrix factorization (NMF) procedure are presented. This work shows that it may require more than one template to represent a note according to time-varying behavior of timbre, especially played by bowed string instruments. The proposed method improves original NMF without the knowledge of tone models and the number of required templates in advance. Resultant NMF information is then converted into the synthesis parameters of the sinusoidal synthesis. Bach cello suites recorded by Fournier and Starker are used in the experiments. Analysis and trans-synthesis examples of the recordings are also provided. Index Terms—trans-synthesis, non-negative matrix factorization, bowed string instrument
Download Identification of Time-frequency Maps for sounds timbre discrimination
Gabor Multipliers are signals operator which are diagonal in a time-frequency representation of signals and can be viewed as timefrequency transfer function. If we estimate a Gabor mask between a note played by two instruments, then we have a time-frequency representation of the difference of timbre between these two notes. By averaging the energy contained in the Gabor mask, we obtain a measure of this difference. In this context, our goal is to automatically localize the time-frequency regions responsible for such a timbre dissimilarity. This problem is addressed as a feature selection problem over the time-frequency coefficients of a labelled data set of sounds.
Download Sparse Atomic Modeling of Audio: a Review
Research into sparse atomic models has recently intensified in the image and audio processing communities. While other reviews exist, we believe this paper provides a good starting point for the uninitiated reader as it concisely summarizes the state-of-the-art, and presents most of the major topics in an accessible manner. We discuss several approaches to the sparse approximation problem including various greedy algorithms, iteratively re-weighted least squares, iterative shrinkage, and Bayesian methods. We provide pseudo-code for several of the algorithms, and have released software which includes fast dictionaries and reference implementations for many of the algorithms. We discuss the relevance of the different approaches for audio applications, and include numerical comparisons. We also illustrate several audio applications of sparse atomic modeling.
Download State of the Art in Sound Texture Synthesis
The synthesis of sound textures, such as rain, wind, or crowds, is an important application for cinema, multimedia creation, games and installations. However, despite the clearly defined requirments of naturalness and flexibility, no automatic method has yet found widespread use. After clarifying the definition, terminology, and usages of sound texture synthesis, we will give an overview of the many existing methods and approaches, and the few available software implementations, and classify them by the synthesis model they are based on, such as subtractive or additive synthesis, granular synthesis, corpus-based concatenative synthesis, wavelets, or physical modeling. Additionally, an overview is given over analysis methods used for sound texture synthesis, such as segmentation, statistical modeling, timbral analysis, and modeling of transitions. 2
Download Automatic Alignment of Audio Occurrences: Application to the Verification and Synchronization of Audio Fingerprinting Annotation
We propose here an original method for the automatic alignment of temporally distorted occurrences of audio items. The method is based on a so-called item-restricted fingerprinting process and a segment detection scheme. The high-precision estimation of the temporal distortions allows to compensate these alterations and obtain a perfect synchronization between the original item and the altered occurrence. Among the applications of this process, we focus on the verification and the alignment of audio fingerprinting annotations. Perceptual evaluation confirms the efficiency of the method in detecting wrong annotations, and confirms the high precision of the synchronization on the occurrences.