Download Improved method for extraction of partial’s parameters in polyphonic transcription of piano higher octaves
Polyphonic transcription is specially challenging in piano higher octaves due to the complexity of the spectrum of notes and therefore, chords. Besides the fundamental and second partial components, other spectral elements appears. The three peaks related to the unison as well as the second harmonic of the fundamental unison can be distinguished in most measures. Furthermore, intermodulation components are also present when non-linearity is high enough. This paper compares several methods to improve the training process that allows to synthesize the spectral patterns and masks used in transcription methods.
Download Live Tracking of Musical Performances using On-Line Time Warping
Dynamic time warping finds the optimal alignment of two time series, but it is not suitable for on-line applications because it requires complete knowledge of both series before the alignment can be computed. Further, the quadratic time and space requirements are limiting factors even for off-line systems. We present a novel on-line time warping algorithm which has linear time and space costs, and performs incremental alignment of two series as one is received in real time. This algorithm is applied to the alignment of audio signals in order to follow musical performances of arbitrary length. Each frame of audio is represented by a positive spectral difference vector, emphasising note onsets. The system was tested on various test sets, including recordings of 22 pianists playing music by Chopin, where the average alignment error was 59ms (median 20ms). We demonstrate one application of the system: the analysis and visualisation of musical expression in real time.
Download Gestural exploitation of ecological information in continuous sonic feedback – The case of balancing a rolling ball
Continuous sensory–motor loops form a topic dealt with rather rarely in experiments and applications of ecological auditory perception. Experiments with a tangible audio–visual interface around a physics-based sound synthesis core address this aspect. Initially dealing with the evaluation of a specific work of sound and interaction design, they deliver new arguments and notions for non-speech auditory display and are also to be seen in a wider context of psychoacoustic knowledge and methodology.
Download Vocal synthesis and graphical representation of the phonetic gestures underlying guitar timbre description
The guitar is an instrument that gives the player great control over timbre. Different plucking techniques involve varying the finger position along the string, the inclination between the finger and the string, the inclination between the hand and the string and the degree of relaxation of the plucking finger. Guitarists perceive subtle variations of these parameters and they have developed a very rich vocabulary to describe the brightness, the colour, the shape and the texture of the sounds they produce on their instrument. Dark, bright, chocolatey, transparent, muddy, wooly, glassy, buttery, and metallic are just a few of those adjectives. The aim of this research is to conceive a computer tool producing the synthesis of the vocal imitation as well as the graphical representation of phonetic gestures underlying the description of the timbre of the classical guitar, as a function of the instrumental gesture parameters (mainly the plucking angle and distance from the bridge) and based on perceptual analogies between guitar and speech sounds. Similarly to the traditional teaching of tabla which uses onomatopeia to designate the different strokes, vocal imitation of guitar timbres could provide a common language to guitar performers, complementary to the mental imagery they commonly use to communicate about timbre, in a pedagogical context for example.
Download Improving Sinusoidal Frequency Estimation Using a Trigonometric Approach
Estimating the frequency of sinusoidal components is the first part of the sinusoidal analysis chain. Among numerous frequency estimators presented in the literature, we propose to study an estimator proposed in [1] known as the derivative algorithm. Thanks to a trigonometric interpretation of this frequency estimator, we are able to propose a new estimator which improves estimation performance for the frequencies close to the Nyquist frequency without any computational overload.
Download An Efficient Algorithm for Real-Time Spectrogram Inversion
We present a computationally efficient real-time algorithm for constructing audio signals from spectrograms. Spectrograms consist of a time sequence of short time Fourier transform magnitude (STFTM) spectra. During the audio signal construction process, phases are derived for the individual frequency components so that the spectrogram of the constructed signal is as close as possible to the target spectrogram given real-time constraints. The algorithm is a variation of the classic Griffin and Lim [1] technique modified to be computable in real-time. We discuss the application of the algorithm to time-scale modification of audio signals such as speech and music, and performance is compared with other methods. The new algorithm generates comparable or better results with significantly less computation. The phase consistency between adjacent frames produces excellent subjective sound quality with minimal fame transition artifacts.
Download GABOR, multi-representation real-time analysis/synthesis
This article describes a set of modules for Max/MSP for real-time sound analysis and synthesis combining various models, representations and timing paradigms. Gabor provides a unified framework for granular synthesis, PSOLA, phase vocoder, additive synthesis and other STFT techniques. Gabor’s processing scheme allows for the treatment of atomic sound particles at arbitrary rates and instants. Gabor is based on FTM, an extension of Max/MSP, introducing complex data structures such as matrices and sequences to the Max data flow programming paradigm. Most of the signal processing operators of the Gabor modules handle vector and matrix representations closely related to SDIF sound description formats.
Download Speech/music discrimination based on a new warped LPC-based feature and linear discriminant analysis
Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents a low complexity but effective approach, which exploits only one simple feature, called Warped LPC-based Spectral Centroid (WLPCSC). Comparison between WLPC-SC and the classical features proposed in [9] is performed, aiming to assess the good discriminatory power of the proposed feature. The length of the vector for describing the proposed psychoacoustic based feature is reduced to a few statistical values (mean, variance and skewness), which are then transformed to a new feature space by applying LDA with the aim of increasing the classification accuracy percentage. The classification task is performed by applying SVM to the features in the transformed space. The classification results for different types of music and speech show the good discriminating power of the proposed approach.
Download Hidden Markov Models for spectral similarity of songs
Hidden Markov Models (HMM) are compared to Gaussian Mixture Models (GMM) for describing spectral similarity of songs. Contrary to previous work we make a direct comparison based on the log-likelihood of songs given an HMM or GMM. Whereas the direct comparison of log-likelihoods clearly favors HMMs, this advantage in terms of modeling power does not allow for any gain in genre classification accuracy.
Download Adaptive Network-Based Fuzzy Inference System for Automatic Speech/Music Discrimination
Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents an effective approach based on an Adaptive Network-Based Fuzzy Inference System (ANFIS) for the classification stage required in a speech/music discrimination system. A new simple feature, called Warped LPC-based Spectral Centroid (WLPC-SC), is also proposed. Comparison between WLPC-SC and some of the classical features proposed in [11] is performed, aiming to assess the good discriminatory power of the proposed feature. The length of the vector for describing the proposed psychoacoustic-based feature is reduced to a few statistical values (mean, variance and skewness). To evaluate the performance of the ANFIS system for speech/music discrimination, comparison to other commonly used classifiers is reported. The classification results for different types of music and speech show the good discriminating power of the proposed approach.