Download Fast Sinusoid Synthesis for MPEG-4 HILN Parametric Audio Decoding
Additive sinusoidal synthesis is a popular technique for applications like sound synthesis or very low bit rate parametric audio decoding. In this paper, different algorithms for the efficient synthesis of sinusoids on general purpose CPUs as found in today’s PCs are investigated. Fast algorithms for time domain synthesis of constant and linearly changing frequencies are presented and compared to frequency domain synthesis approaches. Execution time and accuracy (SNR) of the algorithms are reported for different CPU types. Finally, the algorithms are implemented in a fast MPEG-4 HILN parametric audio decoder in order to evaluate their performance in a real world application.
Download The caterpillar system for data-driven concatenative sound synthesis
Concatenative data-driven synthesis methods are gaining more interest for musical sound synthesis and effects. They are based on a large database of sounds and a unit selection algorithm which finds the units that match best a given sequence of target units. We describe related work and our C ATERPILLAR synthesis system, focusing on recent new developments: the advantages of the addition of a relational SQL database, work on segmentation by alignment, the reformulation and extension of the unit selection algorithm using a constraint resolution approach, and new applications for musical and speech synthesis.
Download The DESAM Toolbox: Spectral Analysis of Musical Audio
In this paper is presented the DESAM Toolbox, a set of Matlab functions dedicated to the estimation of widely used spectral models for music signals. Although those models can be used in Music Information Retrieval (MIR) tasks, the core functions of the toolbox do not focus on any specific application. It is rather aimed at providing a range of state-of-the-art signal processing tools that decompose music files according to different signal models, giving rise to different “mid-level” representations. After motivating the need for such a toolbox, this paper offers an overview of the overall organization of the toolbox, and describes all available functionalities.
Download A Combined Model for a Bucket Brigade Device and its Input and Output Filters
Bucket brigade devices (BBDs) were invented in the late 1960s as a method of introducing a time-delay into an analog electrical circuit. They work by sampling the input signal at a certain clock rate and shifting it through a chain of capacitors to obtain the delay. BBD chips have been used to build a large variety of analog effects processing devices, ranging from chorus to flanging to echo effects. They have therefore attracted interest in virtual analog modeling and a number of approaches to modeling them digitally have appeared. In this paper, we propose a new model for the bucket-brigade device. This model is based on a variable samplerate, and utilizes the surrounding filtering circuitry found in real devices to avoid the need for the interpolation usually needed in such a variable sample-rate system.
Download Transaural Stereo in a Beamforming Approach
This paper presents a study on algorithms for headphone-free binaural synthesis using a dedicated loudspeaker configuration. Both algorithms under investigation improve the properties of the binaural synthesis performance of the array. Firstly, beam-forming provides sound radiation localized at two freely adjustable, narrow target spots. Adjusting both spots to the locations of the listener’s ears achieves a good basis. Secondly, an additional interaural crosstalk canceler improves the overall result.
Download REDS: A New Asymmetric Atom for Sparse Audio Decomposition and Sound Synthesis
In this paper, we introduce a function designed specifically for sparse audio representations. A progression in the selection of dictionary elements (atoms) to sparsely represent audio has occurred: starting with symmetric atoms, then to damped sinusoid and hybrid atoms, and finally to the re-appropriation of the gammatone (GT) and formantwave-function (FOF) into atoms. These asymmetric atoms have already shown promise in sparse decomposition applications, where they prove to be highly correlated with natural sounds and musical audio, but since neither was originally designed for this application their utility remains limited. An in-depth comparison of each existing function was conducted based on application specific criteria. A directed design process was completed to create a new atom, the ramped exponentially damped sinusoid (REDS), that satisfies all desired properties: the REDS can adapt to a wide range of audio signal features and has good mathematical properties that enable efficient sparse decompositions and synthesis. Moreover, the REDS is proven to be approximately equal to the previous functions under some common conditions.
Download Novel methods in Information Management for Advanced Audio Workflows
This paper discusses architectural aspects of a software library for unified metadata management in audio processing applications. The data incorporates editorial, production, acoustical and musicological features for a variety of use cases, ranging from adaptive audio effects to alternative metadata based visualisation. Our system is designed to capture information, prescribed by modular ontology schema. This advocates the development of intelligent user interfaces and advanced media workflows in music production environments. In an effort to reach these goals, we argue for the need of modularity and interoperable semantics in representing information. We discuss the advantages of extensible Semantic Web ontologies as opposed to using specialised but disharmonious metadata formats. Concepts and techniques permitting seamless integration with existing audio production software are described in detail.
Download Music Emotion Classification: Dataset Acquisition And Comparative Analysis
In this paper we present an approach to emotion classification in audio music. The process is conducted with a dataset of 903 clips and mood labels, collected from Allmusic1 database, organized in five clusters similar to the dataset used in the MIREX2 Mood Classification Task. Three different audio frameworks – Marsyas, MIR Toolbox and Psysound, were used to extract several features. These audio features and annotations are used with supervised learning techniques to train and test various classifiers based on support vector machines. To access the importance of each feature several different combinations of features, obtained with feature selection algorithms or manually selected were tested. The performance of the solution was measured with 20 repetitions of 10-fold cross validation, achieving a F-measure of 47.2% with precision of 46.8% and recall of 47.6%.
Download Binaural Dark-Velvet-Noise Reverberator
Binaural late-reverberation modeling necessitates the synthesis of frequency-dependent inter-aural coherence, a crucial aspect of spatial auditory perception. Prior studies have explored methodologies such as filtering and cross-mixing two incoherent late reverberation impulse responses to emulate the coherence observed in measured binaural late reverberation. In this study, we introduce two variants of the binaural dark-velvet-noise reverberator. The first one uses cross-mixing of two incoherent dark-velvet-noise sequences that can be generated efficiently. The second variant is a novel time-domain jitter-based approach. The methods’ accuracies are assessed through objective and subjective evaluations, revealing that both methods yield comparable performance and clear improvements over using incoherent sequences. Moreover, the advantages of the jitter-based approach over cross-mixing are highlighted by introducing a parametric width control, based on the jitter-distribution width, into the binaural dark velvet noise reverberator. The jitter-based approach can also introduce timedependent coherence modifications without additional computational cost.
Download Multiple-F0 tracking based on a high-order HMM model
This paper is about multiple-F0 tracking and the estimation of the number of harmonic source streams in music sound signals. A source stream is understood as generated from a note played by a musical instrument. A note is described by a hidden Markov model (HMM) having two states: the attack state and the sustain state. It is proposed to first perform the tracking of F0 candidates using a high-order hidden Markov model, based on a forward-backward dynamic programming scheme. The propagated weights are calculated in the forward tracking stage, followed by an iterative tracking of the most likely trajectories in the backward tracking stage. Then, the estimation of the underlying source streams is carried out by means of iteratively pruning the candidate trajectories in a maximum likelihood manner. The proposed system is evaluated by a specially constructed polyphonic music database. Compared with the frame-based estimation systems, the tracking mechanism improves significantly the accuracy rate.