Download Beat histogram features for rhythm-based musical genre classification using multiple novelty functions In this paper we present beat histogram features for multiple level rhythm description and evaluate them in a musical genre classification task. Audio features pertaining to various musical content categories and their related novelty functions are extracted as a basis for the creation of beat histograms. The proposed features capture not only amplitude, but also tonal and general spectral changes in the signal, aiming to represent as much rhythmic information as possible. The most and least informative features are identified through feature selection methods and are then tested using Support Vector Machines on five genre datasets concerning classification accuracy against a baseline feature set. Results show that the presented features provide comparable classification accuracy with respect to other genre classification approaches using periodicity histograms and display a performance close to that of much more elaborate up-to-date approaches for rhythm description. The use of bar boundary annotations for the texture frames has provided an improvement for the dance-oriented Ballroom dataset. The comparably small number of descriptors and the possibility of evaluating the influence of specific signal components to the general rhythmic content encourage the further use of the method in rhythm description tasks.
Download Multi-channel Audio Information Hiding We consider a method of hiding many audio channels in one host signal. The purpose of this is to provide a ‘mix’ that incorporates information on all the channels used to produce it, thereby allowing all, or, at least some channels to be stored in the mix for later use (e.g. for re-mixing and/or archiving). After providing an overview of some recently published audio water marking schemes in the time and transform domains, we present a method that is based on using a four least significant bits scheme to embed five MP3 files into a single 16-bit host WAV file without incurring any perceptual audio distortions in either the host data or embedded files. The host WAV file is taken to be the final mix associated with the original multi-channel data before applying minimal MP3 compression (WAV to MP3 conversion), or, alternatively, an arbitrary host WAV file into which other multi-channel data in an MP3 format is hidden. The embedded information can be encrypted and/or the embedding locations randomized on a channelby-channel basis depending on the security protocol desired by the user. The method is illustrated by providing example m-code for interested readers to reproduce the results obtained to date and as a basis for further development.
Download Implementing Loudness Models in MATLAB In the field of psychoacoustic analysis the goal is to construct a transformation that will map a time waveform into a domain that best captures the response of a human perceiving sound. A key element of such transformations is the mapping between the sound intensity in decibels and its actual perceived loudness. A number of different loudness models exist to achieve this mapping. This paper examines implementation strategies for some of the more well-known models in the Matlab software environment.
Download Adaptive Harmonization and Pitch Correction of Polyphonic Audio Using Spectral Clustering There are several well known harmonization and pitch correction techniques that can be applied to monophonic sound sources. They are based on automatic pitch detection and frequency shifting without time stretching. In many applications it is desired to apply such effects on the dominant melodic instrument of a polyphonic audio mixture. However, applying them directly to the mixture results in artifacts, and automatic pitch detection becomes unreliable. In this paper we describe how a dominant melody separation method based on spectral clustering of sinusoidal peaks can be used for adaptive harmonization and pitch correction in mono polyphonic audio mixtures. Motivating examples from a violin tutoring perspective as well as modifying the saxophone melody of an old jazz mono recording are presented.
Download Identification of individual guitar sounds by support vector machines This paper introduces an automatic classification system for the identification of individual classical guitars by single notes played on these guitars. The classification is performed by Support Vector Machines (SVM) that have been trained with the features of the single notes. The features used for classification were the time series of the partial tones, the time series of the MFCCs (Mel Frequency Cepstral Coefficients), and the “nontonal” contributions to the spectrum. The influences of these features on the classification success are reported. With this system, 80% of the sounds recorded with three different guitars were classified correctly. A supplementary classification experiment was carried out with human listeners resulting in a rate of 65% of correct classifications.
Download Classification Of Music Signals In The Visual Domain With the huge increase in the availability of digital music, it has become more important to automate the task of querying a database of musical pieces. At the same time, a computational solution of this task might give us an insight into how humans perceive and classify music. In this paper, we discuss our attempts to classify music into three broad categories: rock, classical and jazz. We discuss the feature extraction process and the particular choice of features that we used- spectrograms and mel scaled cepstral coefficients (MFCC). We use the texture-of- texture models to generate feature vectors out of these. Together, these features are capable of capturing the frequency-power profile of the sound as the song proceeds. Finally, we attempt to classify the generated data using a variety of classifiers. we discuss our results and the inferences that can be drawn from them.
Download Sub-Band Independent Subspace Analysis for Drum Transcription While Independent Subspace Analysis provides a means of separating sound sources from a single channel signal, making it an effective tool for drum transcription, it does have a number of problems. Not least of these is that the amount of information required to allow separation of sound sources varies from signal to signal. To overcome this indeterminacy and improve the robustness of transcription an extension of Independent Subspace Analysis to include sub-band processing is proposed. The use of this approach is demonstrated by its application in a simple drum transcription algorithm.
Download Real Time Implementation of the HVXC MPEG-4 Speech Coder In this paper we present the results of the code optimization for the HVXC MPEG-4 speech coder. Two kinds of bit-rate formats are considered: 2 and 4 kbit/s. After a short description of the HVXC main features, results of code optimization are reported: the real time implementationon, on a floating point DSP, of three parallel 2 kbit/s or two parallel 4 kbit/s HVXC coders, is shown to be possible.
Download Independent subspace analysis using locally linear embedding While Independent Subspace Analysis provides a means of blindly separating sound sources from a single channel signal, it does have a number of problems. In particular the amount of information required for separation of sources varies with the signal. This is as a result of the variance-based nature of Principal Component Analysis, which is used for dimensional reduction in the Independent Subspace Analysis algorithm. In an attempt to overcome this problem the use of a non-variance based dimensional reduction method, Locally Linear Embedding, is proposed. Locally Linear Embedding is a geometry based dimensional reduction technique. The use of this approach is demonstrated by its application to single channel source separation, and its merits discussed.
Download A hierarchical approach to automatic musical genre classification A system for the automatic classification of audio signals according to audio category is presented. The signals are recognized as speech, background noise and one of 13 musical genres. A large number of audio features are evaluated for their suitability in such a classification task, including well-known physical and perceptual features, audio descriptors defined in the MPEG-7 standard, as well as new features proposed in this work. These are selected with regard to their ability to distinguish between a given set of audio types and to their robustness to noise and bandwidth changes. In contrast to previous systems, the feature selection and the classification process itself are carried out in a hierarchical way. This is motivated by the numerous advantages of such a tree-like structure, which include easy expansion capabilities, flexibility in the design of genre-dependent features and the ability to reduce the probability of costly errors. The resulting application is evaluated with respect to classification accuracy and computational costs.