Download Polyphonic music analysis by signal processing and support vector machines
In this paper an original system for the analysis of harmony and polyphonic music is introduced. The system is based on signal processing and machine learning. A new multi-resolution, fast analysis method is conceived to extract time-frequency energy spectrum at the signal processing stage, while support vector machine is used as machine learning technology. Aiming at the analysis of rather general audio content, experiments are made on a huge set of recorded samples, using 19 music instruments combined together or alone, with different polyphony. Experimental results show that fundamental frequencies are detected with a remarkable success ratio and that the method can provide excellent results in general cases.
Download A Framework for Sonification of Vicon Motion Capture Data
This paper describes experiments on sonifying data obtained using the VICON motion capture system. The main goal is to build the necessary infrastructure in order to be able to map motion parameters of the human body to sound. For sonification the following three software frameworks were used: Marsyas, traditionally used for music information retrieval with audio analysis and synthesis, CHUCK, an on-the-fly real-time synthesis language, and Synthesis Toolkit (STK), a toolkit for sound synthesis that includes many physical models of instruments and sounds. An interesting possibility is the use of motion capture data to control parameters of digital audio effects. In order to experiment with the system, different types of motion data were collected. These include traditional performance on musical instruments, acting out emotions as well as data from individuals having impairments in sensor motor coordination. Rhythmic motion (i.e. walking) although complex, can be highly periodic and maps quite naturally to sound. We hope that this work will eventually assist patients in identifying and correcting problems related to motor coordination through sound.
Download Acoustic localization of tactile interactions for the development of novel tangible interfaces
Download Speech/music discrimination based on a new warped LPC-based feature and linear discriminant analysis
Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents a low complexity but effective approach, which exploits only one simple feature, called Warped LPC-based Spectral Centroid (WLPCSC). Comparison between WLPC-SC and the classical features proposed in [9] is performed, aiming to assess the good discriminatory power of the proposed feature. The length of the vector for describing the proposed psychoacoustic based feature is reduced to a few statistical values (mean, variance and skewness), which are then transformed to a new feature space by applying LDA with the aim of increasing the classification accuracy percentage. The classification task is performed by applying SVM to the features in the transformed space. The classification results for different types of music and speech show the good discriminating power of the proposed approach.
Download Adaptive Network-Based Fuzzy Inference System for Automatic Speech/Music Discrimination
Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents an effective approach based on an Adaptive Network-Based Fuzzy Inference System (ANFIS) for the classification stage required in a speech/music discrimination system. A new simple feature, called Warped LPC-based Spectral Centroid (WLPC-SC), is also proposed. Comparison between WLPC-SC and some of the classical features proposed in [11] is performed, aiming to assess the good discriminatory power of the proposed feature. The length of the vector for describing the proposed psychoacoustic-based feature is reduced to a few statistical values (mean, variance and skewness). To evaluate the performance of the ANFIS system for speech/music discrimination, comparison to other commonly used classifiers is reported. The classification results for different types of music and speech show the good discriminating power of the proposed approach.
Download j-DAFx - Digital Audio Effects in Java
This paper describes an attempt to provide an online learning platform for digital audio effects. After a comprehensive study of different technologies presenting multimedia content dynamically reacting to user input, we decided to use Java Applets. Further investigations regard the implementation issues - especially the processing and visualization of audio data - and present a general framework used in our department. Recent and future digital effects implemented in this framework can be found on our web site.
Download Comparing synthetic and real templates for dynamic time warping to locate partial envelope features
In this paper we compare the performance of a number of different templates for the purposes of split point identification of various clarinet envelopes. These templates were generated with AttackDecay-Sustain-Release (ADSR) descriptions commonly used in musical synthesis, along with a set of real templates obtained using k-means clustering of manually prepared test data. The goodness of fit of the templates to the data was evaluated using the Dynamic Time Warping (DTW) cost function, and by evaluating the square of the distance of the identified split points to the manually identified split points in the test data. It was found that the best templates for split point identification were the synthetic templates followed by the real templates having a sharp attack and release characteristic, as is characteristic of the clarinet envelope.
Download An auditory 3D file manager designed from interaction patterns
This paper shows the design, implementation and evaluation of an auditory user interface for a file-manager application. The intention for building this prototype was to prove concepts developed to support user interface designers with design patterns in order to create robust and efficient auditory displays. The paper describes the motivation for introducing a mode-independent meta domain in which the design patterns were defined to overcome the problem of translating mainly visual concepts to the auditory domain. The prototype was implemented using the IEM Ambisonics libraries for Pure Data to produce high quality binaural audio rendering and used headtracking and a joystick as the main interaction devices.
Download A New Functional Framework for a Sound System for Realtime Flight Simulation
We will show a new sound framework and concept for realistic flight simulation. Dealing with a highly complex network of mechanical systems that act as physical sound sources the main focus is on a fully modular and extensible/scalable design. The prototype we developed is part of a fully functional Full Flight Simulator for Pilot Training.
Download Performing Expressive Rhythms with BillaBoop Voice-Driven Drum Generator
In a previous work we presented a system for transcribing spoken rhythms into a symbolic score. Thereafter, the system was extended to process the vocal stream in real-time in order to allow a musician to use it as a voice-driven drum generator. Extensions to this work are the following. First we achieved a study of the system classification accuracy based on typical onomatopoeia used in western beat boxing, with the perspective of building a general supervised model for immediate use. Also, we want the user to be able to generate expressive rhythms, beyond the symbolic drum representation. Thus we considered a class-specific mapping of continuous vocal stream descriptors with either effects or synthesis parameters of the drum generator. The extraction of the symbolic drum stream is implemented in the BillaBoop VST Core plug-in. The class-specific mapping and the sound synthesis are carried out in Plogue Bidule 1 framework. All these components are integrated into a low-latency application that allows its use for live performances.