Download Maximum Filter Vibrato Suppression for Onset Detection We present SuperFlux - a new onset detection algorithm with vibrato suppression. It is an enhanced version of the universal spectral flux onset detection algorithm, and reduces the number of false positive detections considerably by tracking spectral trajectories with a maximum filter. Especially for music with heavy use of vibrato (e.g., sung operas or string performances), the number of false positive detections can be reduced by up to 60% without missing any additional events. Algorithm performance was evaluated and compared to state-of-the-art methods on the basis of three different datasets comprising mixed audio material (25,927 onsets), violin recordings (7,677 onsets) and operatic solo voice recordings (1,448 onsets). Due to its causal nature, the algorithm is applicable in both offline and online real-time scenarios.
Download Source Filter Model For Expressive Gu-Qin Synthesis and its iOS App Gu-Qin as a venerable Chinese plucked-string instrument has its unique performance techniques and enchanting sounds. It is on the UNESCO Representative List of the Intangible Cultural Heritage of Humanity. It is one of the oldest Chinese solo instruments. The variation of Gu-Qin sound is so large that carefullydesigned controls of its computer synthesizer are necessary. We developed a parametric source-filter model for re-synthesizing expressive Gu-Qin notes. It is capable to cover as many as possible combinations of Gu-Qin’s performance techniques. In this paper, a brief discussion of Gu-Qin playing and its special tablature notation are made for understanding the relationship between its performance techniques and its sounds. This work includes a Gu-Qin’s musical notation system and a source-filter model based synthesizer. In addition, we implement an iOS app to demonstrate its low computation complexity and robustness. It is easy to perform improvisation of the sounds because of its friendly user interfaces.
Download Simulation of Textured Audio Harmonics Using Random Fractal Phaselets We present a method of simulating audio signals using the principles of random fractal geometry which, in the context of this paper, is concerned with the analysis of statistically self-affine ‘phaselets’. The approach is used to generate audio signals that are characterised by texture and timbre through the Fractal Dimension such as those associated with bowed stringed instruments. The paper provides a short overview on potential simulation methods using Artificial Neural Networks and Evolutionary Computing and on the problems associated with using a deterministic approach based on solutions to the acoustic wave equation. This serves to quantify the origins of the ‘noise’ associated with multiple scattering events that characterise texture and timbre in an audio signal. We then explore a method to compute the phaselet of a phase signal which is the primary phase function from which a phase signal is, to a good approximation, a periodic replica and show that, by modelling the phaselet as a random fractal signal, it can be characterised by the Fractal Dimension. The Fractal Dimension is then used to synthesise a phaselet from which the phase function is computed through multiple concatenations of the phaselet. The paper provides details of the principal steps associated with the method considered and examines some example results, providing a URL to m-coded functions for interested readers to repeat the results obtained and develop the algorithms further.
Download Re-Thinking Sound Separation: Prior Information and Additivity Constraint in Separation Algorithms In this paper, we study the effect of prior information on the quality of informed source separation algorithms. We present results with our system for solo and accompaniment separation and contrast our findings with two other state-of-the art approaches. Results suggest current separation techniques limit performance when compared to extraction process of prior information. Furthermore, we present an alternative view of the separation process where the additivity constraint of the algorithm is removed in the attempt to maximize obtained quality. Plausible future directions in sound separation research are discussed.
Download Generating Musical Accompaniment Using Finite State Transducers The finite state transducer (FST), a type of finite state machine that maps an input string to an output string, is a common tool in the fields of natural language processing and speech recognition. FSTs have also been applied to music-related tasks such as audio fingerprinting and the generation of musical accompaniment. In this paper, we describe a system that uses an FST to generate harmonic accompaniment to a melody. We provide details of the methods employed to quantize a music signal, the topology of the transducer, and discuss our approach to evaluating the system. We argue for an evaluation metric that takes into account the quality of the generated accompaniment, rather than one that returns a binary value indicating the correctness or incorrectness of the accompaniment.
Download TELTPC Based Re-Synthesis Method for Isolated Notes of Polyphonic Instrumental Music Recordings In this paper, we presented a flexible analysis/re-synthesis method for smoothly changing the properties of isolated notes in polyphonic instrumental music recordings. True Envelope Linear Predictive Coding (TELPC) method has been employed as the analysis/synthesis model in order to preserve the original timbre quality as much as possible due to its accurate spectral envelope estimation. We modified the conventional LPC analysis/synthesis processing by using pitch synchronous analysis frames to avoid the severe magnitude modulation problem. Smaller frames can thus be used to capture more local characteristics of the original signals to further improve the sound quality. In this framework, one can manipulate a sequence of isolated notes from two commercially available polyphonic instrumental music recordings and interesting re-synthesized results are achieved.
Download A Preliminary Analysis of the Continuous Axis Value of the Threedimensional PAD Speech Emotional State Mode The traditional way of emotional classification involves using the two-dimensional (2D) emotional model by Thayer, which identifies emotion by arousal and valence. The 2D model is not fine enough to classify among the rich vocabularies of emotions, such as distinguish between disgusting and fear. Another problem of the traditional methods is that they don’t have a formal definition of the axis value of the emotional model. They either assign the axis value manually or rate them by listening test. We propose to use the PAD (Pleasure, Arousal, Dominance) emotional state model to describe speech emotion in a continuous 3-dimensional scale. We suggest an initial definition of the continuous axis values by observing into the pattern of Log Frequency Power Coefficients (LFPC) fluctuation. We verify the result using a database of German emotional speech. Experiments show that the classification result of a set of big-6 emotions on average is 81%.
Download The Tonalness Spectrum: Feature-Based Estimation of Tonal Components The tonalness spectrum shows the likelihood of a spectral bin being part of a tonal or non-tonal component. It is a non-binary measure based on a set of established spectral features. An easily extensible framework for the computation, selection, and combination of features is introduced. The results are evaluated and compared in two ways. First with a data set of synthetically generated signals but also with real music signals in the context of a typical MIR application.
Download Unsupervised Audio Key and Chord Recognition This paper presents a new methodology for determining chords of a music piece without using training data. Specifically, we introduce: 1) a wavelet-based audio denoising component to enhance a chroma-based feature extraction framework, 2) an unsupervised key recognition component to extract a bag of local keys, 3) a chord recognizer using estimated local keys to adjust the chromagram based on a set of well-known tonal profiles to recognize chords on a frame-by-frame basis. We aim to recognize 5 classes of chords (major, minor, diminished, augmented, suspended) and 1 N (no chord or silence). We demonstrate the performance of the proposed approach using 175 Beatles’ songs which we achieved 75% in F-measure for estimating a bag of local keys and at least 68.2% accuracy on chords without discarding any audio segments or the use of other musical elements. The experimental results also show that the wavelet-based denoiser improves the chord recognition rate by approximately 4% over that of other chroma features.
Download Extended Source-Filter Model for Harmonic Instruments for Expressive Control of Sound Synthesis and Transformation In this paper we present a revised and improved version of a recently proposed extended source-filter model for sound synthesis, transformation and hybridization of harmonic instruments. This extension focuses mainly on the application for impulsively excited instruments like piano or guitar, but also improves synthesis results for continuously driven instruments including their hybrids. This technique comprises an extensive analysis of an instruments sound database, followed by the estimation of a generalized instrument model reflecting timbre variations according to selected control parameters. Such an instrument model allows for natural sounding transformations and expressive control of instrument sounds regarding its control parameters.