Download A Database of Partial Tracks for Evaluation of Sinusoidal Models
This paper presents a database of partial tracks extracted from synthetic as well as pre-recorded musical signals, designed to serve as an ancillary tool for evaluation of sinusoidal analysis algorithms. In order to accomplish this goal, the database requirements have been carefully specified. A semi-automatic analysis methodology to ensure the track parameters are precisely estimated has been employed. The overall methodology is validated via the application of performance tests over the synthetic source-signals.
Download Discrete Wavelet Transform based Shift-Invariant Analysis Scheme for Transient Sound Signals
Discrete wavelet transform (DWT) has gained widespread recognition and popularity in signal processing due to its ability to underline and represent time-varying spectral properties of many transient and other nonstationary signals. However, DWT is a shiftvariant transform. This shift-variance is a major problem with the use of DWT for transient signal analysis and pattern recognition applications. A number of modified forms of DWT have been investigated in recent years that provide approximate shift-invariant transform but at the cost of increased redundancy and complexity. In this paper, a shift-invariant analysis scheme is proposed which is nonredundant. This scheme combines minimum-phase (MP) reconstruction with the DWT so that the resultant scheme provides a shift-invariant transform. The detailed properties of MP signal and different methods to reconstruct it are explained. The proposed scheme can be used for the analysis-synthesis, classification, and compression of transient sound signals.
Download The Restoration of Single Channel Audio Recordings Based on Non-Negative Matrix Factorization and Perceptual Suppression Rule
In this paper, we focus on the signal-to-noise ratio (SNR) improvement in single channel audio recordings. Many approaches have been reported in the literature. The most popular method, with many variants, is Short Time Spectral Attenuation (STSA). Although this method reduces the noise and improves the SNR, it mostly tends to introduce signal distortion and a perceptually annoying residual noise usually called musical noise. In this paper we investigate the use of Non-negative Matrix Factorization (NMF) as an alternative to the STSA for the digital curation of musical heritage. NMF is an emerging new technique in the blind extraction of signals recorded in a variety of different fields. The application of NMF to the analysis of monaural recordings is relatively recent. We show that NMF is a suitable technique to extract the clean audio signal from undesired non stationary noise in a monaural recording of ethnic music. More specifically, we introduce a perceptual suppression rule to determine how the perceptual domain is competitive compared to the acoustic domain. Moreover, we carry out a listening test in order to compare NMF with the state of the art audio restoration framework using the EBU MUSHRA test method. The encouraging results obtained with this methodology in the presented case study support their wider applicability in audio separation.
Download Towards a Fuzzy Logic Approach to Drum Pattern Humanisation
A fuzzy logic-based approach can be used to simulate human agents in many control situations. Numerous authors have noted that this methodology has advantages for a variety of tasks within the realm of computer music. In this paper, a review of such projects is conducted and a rudimentary example application of fuzzy logic techniques is presented. This automatically achieves a basic level of 'humanisation' of a drum pattern through strike velocity modification. Such a tool could significantly reduce the time spent on editing individual drum hits in a music production environment and has potential applications for rhythmic composition and performance.
Download GPU-Based Spectral Model Synthesis for Real-Time Sound Rendering
The timbre of an instrument is usually represented by sinusoids plus noise. Spectral modeling synthesis (SMS) is an audio synthesis technique which can create musical timbre and give control over the frequency and amplitude. Additive synthesis and LPC synthesis are usually applied for synthesizing sinusoids and residuals, respectively. However, it takes fairly large computing power while implementing the algorithms. The purpose of this paper is to present GPU-based techniques of implementing SMS for real-time audio processing by using parallelism and programmability in graphics pipeline. The performance is compared to CPU-based implementations.
Download Virtual Acoustic Recording: An Interactive Approach
In this paper, we present a framework for recording real musical auditory scenes for interactive virtual acoustic reproduction over headphones. The framework considers the parameterization of real-world soundfields and subsequent real-time auralization using a hybrid image source method/measurement-based auralization approach. First Order (FOA) and Higher Order (HOA) Ambisonics are utilized together in a single system to provide an optimized and psychoacoustically justified framework.
Download Statistical Spectral Envelope Transformation applied to Emotional Speech
Transformation of sound by statistical techniques is a promising method for a new range of digital audio effects. In this paper a data driven voice transformation algorithm is used to alter the timbre of a neutral (non-emotional) voice in order to reproduce a particular emotional vocal timbre. Perceptually based Mel-Cepstral analysis and Mel Log Spectral Approximation digital filter are used to represent the speech timbre and to synthesize speech with modified spectral envelope. The transformation function adopts a GMM (Gaussian Mixture Model) based parametrization in order convert the spectral envelopes. Experiments with the first and second order derivatives of the mel-cepstral coefficients have been undertaken to prove the benefit of including dynamic information in the model. The proposed algorithm has been evaluated by means of objective measures in the neutral-to-happy and neutral-to-sad tasks.
Download Analysis / Synthesis of Rolling Sounds Using a Source Filter Approach
In this paper, the analysis and synthesis of a rolling ball sound is proposed. The approach is based on the assumption that the rolling sound is generated by a concatenation of micro-impacts between a ball and a surface, each having associated resonances. Contact timing information is first extracted from the rolling sound using an onset detection process. The resulting individual contact segments are subband filtered before being analyzed using linear predictive coding (LPC) and notch filter parameter estimation. The segments are then resynthesized and overlap-added to form a complete rolling sound. This approach is similar to that of [1], though the methods used for contact event detection and filter parameter estimation are completely different.
Download Music Structure Discovery Based on Normalized Cross Correlation
Music Structure Discovery (MSD) for popular music is a well known task in Music Information Retrieval (MIR). The proposed approach tries to find the basic musical structure of a piece of music, by applying a template matching algorithm on a modified, bar level Self Distance Matrix (SDM). Mel frequency cepstral coefficients (MFCC) are used to represent timbral qualities of the audio material while chroma vectors are selected to incorporate pitch and harmonic content. The new idea of template matching instead of trying to find explicit blocks or off-diagonal lines is independent of any specific characteristics of the underlying SDM and can therefore be used on a wide range of different songs.
Download Augmenting Sound Mosaicing with Descriptor-Driven Transformation
We propose a strategy for integrating descriptor-driven transformation into mosaicing sound synthesis, in which samples are selected by taking into account potential distances in the transformed space. Target descriptors consisting of chroma, mel-spaced filter banks, and energy are modeled with respect to windowed bandlimited resampling and mel-spaced filters, and later corrected with gain. These transformations, however simple, allow some adaptation of textural sound material to musical contexts.