Download Polyphonic music analysis by signal processing and support vector machines
In this paper an original system for the analysis of harmony and polyphonic music is introduced. The system is based on signal processing and machine learning. A new multi-resolution, fast analysis method is conceived to extract time-frequency energy spectrum at the signal processing stage, while support vector machine is used as machine learning technology. Aiming at the analysis of rather general audio content, experiments are made on a huge set of recorded samples, using 19 music instruments combined together or alone, with different polyphony. Experimental results show that fundamental frequencies are detected with a remarkable success ratio and that the method can provide excellent results in general cases.
Download Adaptive Network-Based Fuzzy Inference System for Automatic Speech/Music Discrimination
Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents an effective approach based on an Adaptive Network-Based Fuzzy Inference System (ANFIS) for the classification stage required in a speech/music discrimination system. A new simple feature, called Warped LPC-based Spectral Centroid (WLPC-SC), is also proposed. Comparison between WLPC-SC and some of the classical features proposed in [11] is performed, aiming to assess the good discriminatory power of the proposed feature. The length of the vector for describing the proposed psychoacoustic-based feature is reduced to a few statistical values (mean, variance and skewness). To evaluate the performance of the ANFIS system for speech/music discrimination, comparison to other commonly used classifiers is reported. The classification results for different types of music and speech show the good discriminating power of the proposed approach.
Download Generalised Prior Subspace Analysis for Polyphonic Pitch Transcription
A reformulation of Prior Subspace Analysis (PSA) is presented, which restates the problem as that of fitting an undercomplete signal dictionary to a spectrogram. Further, a generalization of PSA is derived which allows the transcription of polyphonic pitched instruments. This involves the translation of a single frequency prior subspace of a note to approximate other notes, overcoming the problem of needing a separate basis function for each note played by an instrument. Examples are then demonstrated which show the utility of the generalised PSA algorithm for the purposes of polyphonic pitch transcription.
Download Blind Source Separation Using Repetitive Structure
Blind source separation algorithms typically involve decorrelating time-aligned mixture signals. The usual assumption is that all sources are active at all times. However, if this is not the case, we show that the unique pattern of source activity/inactivity helps separation. Music is the most obvious example of sources exhibiting repetitive structure because it is carefully constructed. We present a novel source separation algorithm based on spatial time-time distributions that capture the repetitive structure in audio. Our method outperforms time-frequency source separation when source spectra are highly overlapping.
Download Segregation of Two Simultaneously Arriving Narrowband Noise Signals as a Function of Spatial and Frequency Separation
The present paper details a set of subjective measurements that were carried out in order to investigate the perceptual fusion and segregation of two simultaneously presented ERB-bandlimited noise samples as a function of their frequency separation and difference in the direction of arrival. This research was motivated by the desire to gain insight to virtual source technology in multichannel listening and virtual acoustics applications. The segregation threshold was measured in three different spatial configurations, namely with a 0◦ , a 22.5◦ , or a 45◦ azimuth separation between the two noise signals. The tests were arranged so that the subjects adjusted the frequency gap between the two noise bands until they in their opinion were at the threshold of hearing two separate sounds. The results indicate that the frequency separation threshold is increased above approximately 1.5 kHz. The effect of angle separation between ERB-bands was less significant. It is therefore assumed that the results can be accounted by the loss of accuracy in the neural analysis of the complex stimulus waveform fine structure. The results are also relatively divergent between subjects. This is believed to indicate that sound fusion is an individual concept and partly utilizes higher-level processing.
Download Speech/music discrimination based on a new warped LPC-based feature and linear discriminant analysis
Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents a low complexity but effective approach, which exploits only one simple feature, called Warped LPC-based Spectral Centroid (WLPCSC). Comparison between WLPC-SC and the classical features proposed in [9] is performed, aiming to assess the good discriminatory power of the proposed feature. The length of the vector for describing the proposed psychoacoustic based feature is reduced to a few statistical values (mean, variance and skewness), which are then transformed to a new feature space by applying LDA with the aim of increasing the classification accuracy percentage. The classification task is performed by applying SVM to the features in the transformed space. The classification results for different types of music and speech show the good discriminating power of the proposed approach.
Download A Framework for Sonification of Vicon Motion Capture Data
This paper describes experiments on sonifying data obtained using the VICON motion capture system. The main goal is to build the necessary infrastructure in order to be able to map motion parameters of the human body to sound. For sonification the following three software frameworks were used: Marsyas, traditionally used for music information retrieval with audio analysis and synthesis, CHUCK, an on-the-fly real-time synthesis language, and Synthesis Toolkit (STK), a toolkit for sound synthesis that includes many physical models of instruments and sounds. An interesting possibility is the use of motion capture data to control parameters of digital audio effects. In order to experiment with the system, different types of motion data were collected. These include traditional performance on musical instruments, acting out emotions as well as data from individuals having impairments in sensor motor coordination. Rhythmic motion (i.e. walking) although complex, can be highly periodic and maps quite naturally to sound. We hope that this work will eventually assist patients in identifying and correcting problems related to motor coordination through sound.
Download An Efficient Algorithm for Real-Time Spectrogram Inversion
We present a computationally efficient real-time algorithm for constructing audio signals from spectrograms. Spectrograms consist of a time sequence of short time Fourier transform magnitude (STFTM) spectra. During the audio signal construction process, phases are derived for the individual frequency components so that the spectrogram of the constructed signal is as close as possible to the target spectrogram given real-time constraints. The algorithm is a variation of the classic Griffin and Lim [1] technique modified to be computable in real-time. We discuss the application of the algorithm to time-scale modification of audio signals such as speech and music, and performance is compared with other methods. The new algorithm generates comparable or better results with significantly less computation. The phase consistency between adjacent frames produces excellent subjective sound quality with minimal fame transition artifacts.
Download Hidden Markov Models for spectral similarity of songs
Hidden Markov Models (HMM) are compared to Gaussian Mixture Models (GMM) for describing spectral similarity of songs. Contrary to previous work we make a direct comparison based on the log-likelihood of songs given an HMM or GMM. Whereas the direct comparison of log-likelihoods clearly favors HMMs, this advantage in terms of modeling power does not allow for any gain in genre classification accuracy.
Download ACOUSTIC SIGNAL PROCESSING FOR NEXT-GENERATION HUMAN/MACHINE INTERFACES
In this paper, we first define the scenario of a generic acoustic human/machine interface and then formulate the according fundamental signal processing problems. For signal reproduction, the requirements for ideal solutions are stated and some examples for the state of the technology are briefly reviewed. For signal acquisition, the fundamental problems ask for acoustic echo cancellation, desired source extraction, and source localization. After illustrating to which extent acoustic echo cancellation is already a solved problem, we present recent results for separation, dereverberation and localization of multiple source signals. As an underlying motivation for this synoptic treatment, we demonstrate that the considered subproblems (except localization) can be directly interpreted as signal separation or system identification problems with varying degrees of difficulty, which in turn determines the effectiveness of the known solutions.