Download Analysis of transient musical sounds by auto-regressive modeling
This paper gives an example of an auto-regressive (AR) spectral analysis on transient musical sounds. The attack part of many musical sounds is mostly too short to be analysed by a short-time Fourier analysis, whereas this short period of time is long enough for several AR-methods. The AR-spectra obtained from short segments of signals with attack transients have a sufficiently high frequency resolution. These spectra contain more information about the evolution of a sound than a fast Fourier transform made over a small amount of samples.
Download Matching live sources with physical models
This paper investigates the use of a physical model template database as the parameter basis for a MPEG-4 Structured Audio (MP4-SA) codec. During analysis, the codec attempts to match the closest corresponding instrument in the database. In this paper, we emphasize the mechanism enabling this match. We give an overview of the final front end, including the pitch detection stage, and remaining problems are discussed. A draft implementation, written in the Python language is described.
Download A hierarchical approach to automatic musical genre classification
A system for the automatic classification of audio signals according to audio category is presented. The signals are recognized as speech, background noise and one of 13 musical genres. A large number of audio features are evaluated for their suitability in such a classification task, including well-known physical and perceptual features, audio descriptors defined in the MPEG-7 standard, as well as new features proposed in this work. These are selected with regard to their ability to distinguish between a given set of audio types and to their robustness to noise and bandwidth changes. In contrast to previous systems, the feature selection and the classification process itself are carried out in a hierarchical way. This is motivated by the numerous advantages of such a tree-like structure, which include easy expansion capabilities, flexibility in the design of genre-dependent features and the ability to reduce the probability of costly errors. The resulting application is evaluated with respect to classification accuracy and computational costs.
Download MOSIEVIUS: Feature driven interactive audio mosaicing
The process of creating an audio mosaic consists of the concatenation of segments of sound. Segments are chosen to correspond best with a description of a target sound specified by the desired features of the final mosaic. Current audio mosaicing techniques take advantage of the description of future target units in order to make more intelligent decisions when choosing individual segments. In this paper, we investigate ways to expand mosaicing techniques in order to use the mosaicing process as an interactive means of musical expression in real time. In our system, the user can interactively choose the specification of the target as well as the source signals from which the mosaic is composed. These means of control are incorporated into MoSievius, a framework intended for the rapid implementation of different interactive mosaicing techniques. Its integral means of control, the Sound Sieve, provides real-time control over the source selection process when creating an audio mosaic. We discuss a number of new real-time effects that can be achieved through use of the Sound Sieve.
Download Hierarchical Gaussian tree with inertia ratio maximization for the classification of large musical instrument databases
Download System analysis and performance tuning for broadcast audio fingerprinting
An audio fingerprint is a content-based compact signature that summarizes an audio recording. Audio Fingerprinting technologies have recently attracted attention since they allow the monitoring of audio independently of its format and without the need of meta-data or watermark embedding. These technologies need to face channel robustness as well as system accuracy and scalability to succeed on real audio broadcasting environments. This paper presents a complete audio fingerprinting system for audio broadcasting monitoring that satisfies the above system requirements. The system performance is enhanced with four proposals that required detailed analysis of the system blocks as well as extense system tuning experiments.
Download Frequency-domain techniques for high-quality voice modification
This paper presents new frequency-domain voice modification techniques that combine the high-quality usually obtained by timedomain techniques such as TD-PSOLA with the flexibility provided by the frequency-domain representation. The technique only works for monophonic sources (single-speaker), and relies on a (possibly online) pitch detection. Based on the pitch, and according to the desired pitch and formant modifications, individual harmonics are selected and shifted to new locations in the spectrum. The harmonic phases are updated according to a pitchbased method that aims to achieve time-domain shape-invariance, thereby reducing or eliminating the usual artifacts associated with frequency-domain and sinusoidal-based voice modification techniques. The result is a fairly inexpensive, flexible algorithm which is able to match the quality of time-domain techniques, but provides vastly improved flexibility in the array of available modifications.
Download Content-based melodic transformations of audio material for a music processing application
This paper presents an application for performing melodic transformations to monophonic audio phrases. The system first extracts a melodic description from the audio. This description is presented to the user and can be stored and loaded in a MPEG-7 based format. A set of high-level transformations can then be applied to the melodic description. These high-level transformations are mapped into a set of low-level signal transformations and then applied to the audio signal. The algorithms for description extraction and audio transformation are also presented.
Download An efficient audio time-scale modification algorithm for use in a subband implementation
The PAOLA algorithm is an efficient algorithm for the timescale modification of speech. It uses a simple peak alignment technique to synchronise synthesis frames and takes waveform properties and the desired time-scale factor into account to determine optimum algorithm parameters. However, PAOLA has difficulties with certain waveform types and can result in poor synchronisation for subband implementations. SOLA is a less efficient algorithm but resolves the issues associated with PAOLA’s implementation. We present an algorithm that is a combination of the two approaches that proves to be an efficient and effective algorithm for a subband implementation.
Download A new approach to transient processing in the phase vocoder
In this paper we propose a new method to reduce phase vocoder artifacts during attack transients. In contrast to all transient preservation algorithms that have been proposed up to now the new approach does not impose any constraints on the time dilation parameter for processing transient segments. By means of an investigation into the spectral properties of attack transients of simple sinusoids we provide new insights into the causes of phase vocoder artifacts and propose a new method for transient preservation as well as a new criterion and a new algorithm for transient detection. Both, the transient detection and the transient processing algorithms are designed to operate on the level of spectral bins which reduces possible artifacts in stationary signal components that are close to the spectral peaks classified as transient. The transient detection criterion has a close relation to the transient position and allows us to find an optimal position for reinitializing the phase spectrum. The evaluation of the transient detector by means of a hand labeled data base demonstrates its superior performance compared to a previously published algorithm. Attack transients in sound signals transformed with the new algorithm achieves high quality even if strong dilation is applied to polyphonic signals.