Download Extraction of the excitation point location on a string using weighted least-square estimation of a comb filter delay This paper focuses on the extraction of the excitation point location on a guitar string by an iterative estimation of the structural parameters of the spectral envelope. We propose a general method to estimate the plucking point location, working into two stages: starting from a measure related to the autocorrelation of the signal as a first approximation, a weighted least-square estimation is used to refine a FIR comb filter delay value to better fit the measured spectral envelope. This method is based on the fact that, in a simple digital physical model of a plucked-string instrument, the resonant modes translate into an all-pole structure while the initial conditions (a triangular shape for the string and a zero-velocity at all points) result in a FIR comb filter structure.
Download Onset Time Estimation for the Analysis of Percussive Sounds using Exponentially Damped Sinusoids Exponentially damped sinusoids (EDS) model-based analysis of sound signals often requires a precise estimation of initial amplitudes and phases of the components found in the sound, on top of a good estimation of their frequencies and damping. This can be of the utmost importance in many applications such as high-quality re-synthesis or identification of structural properties of sound generators (e.g. a physical coupling of vibrating devices). Therefore, in those specific applications, an accurate estimation of the onset time is required. In this paper we present a two-step onset time estimation procedure designed for that purpose. It consists of a “rough" estimation using an STFT-based method followed by a time-domain method to “refine" the previous results. Tests carried out on synthetic signals show that it is possible to estimate onset times with errors as small as 0.2ms. These tests also confirm that operating first in the frequency domain and then in the time domain allows to reach a better resolution vs. speed compromise than using only one frequency-based or one time-based onset detection method. Finally, experiments on real sounds (plucked strings and actual percussions) illustrate how well this method performs in more realistic situations.
Download A Stochastic State-Space Phase Vocoder for Synthesis of Roughness This paper presents an implementation of the phase vocoder within a Gaussian state-space framework. Rather than formulate the problem as a deterministic evolution of frequencies centered around a given bin, this evolution is treated stochastically by introducing noise into the dynamics matrix of the recursive state equation. This produces effects on the roughness of the input sound, which vary depending on the position within the matrix where the noise is added, how it is propagated throughout the matrix and further by the variance of the noise input.
Download Analysis / Synthesis of Rolling Sounds Using a Source Filter Approach In this paper, the analysis and synthesis of a rolling ball sound is proposed. The approach is based on the assumption that the rolling sound is generated by a concatenation of micro-impacts between a ball and a surface, each having associated resonances. Contact timing information is first extracted from the rolling sound using an onset detection process. The resulting individual contact segments are subband filtered before being analyzed using linear predictive coding (LPC) and notch filter parameter estimation. The segments are then resynthesized and overlap-added to form a complete rolling sound. This approach is similar to that of [1], though the methods used for contact event detection and filter parameter estimation are completely different.
Download Sound Morphing by Audio Descriptors and Parameter Interpolation We present a strategy for static morphing that relies on the sophisticated interpolation of the parameters of the signal model and the independent control of high-level audio features. The source and target signals are decomposed into deterministic, quasi-deterministic and stochastic parts, and are processed separately according to sinusoidal modeling and spectral envelope estimation. We gain further intuitive control over the morphing process by altering the interpolated spectrum according to target values of audio descriptors through an optimization process. The proposed approach leads to convincing morphing results in the case of sustained or percussive, harmonic and inharmonic sounds of possibly different durations.
Download On the control of the phase of resonant filters with applications to percussive sound modeling Source-filter models are widely used in numerous audio processing fields, from speech processing to percussive/contact sound synthesis. The design of filters for these models—be it from scratch or from spectral analysis—usually involves tuning frequency and damping parameters and/or providing an all-pole model of the resonant part of the filter. In this context, and for the modelling of percussive (non-sustained) sounds, a source signal can be estimated from a filtered sound through a time-domain deconvolution process. The result can be plagued with artifacts when resonances exhibit very low bandwidth and lie very close in frequency. We propose in this paper a method that noticeably reduces the artifacts of the deconvolution process through an inter-resonance phase synchronization. Results show that the proposed method is able to design filters inducing fewer artifacts at the expense of a higher dynamic range.
Download Gesturally-Controlled Digital Audio Effects This paper presents a detailed analysis of the acoustic effects of the movements of single-reed instrument performers for specific recording conditions. These effects are shown to be mostly resulting from the difference between the time of arrival of the direct sound and that of the first reflection, creating a sort of phasing or flanging effect. Contrary to the case of commercial flangers – where delay values are set by a LFO (low frequency oscillator) waveform – the amount of delay in a recording of an acoustic instrument is a function of the position of the instrument with respect to the microphone. We show that for standard recordings of a clarinet, continuous delay variations from 2 to 5 ms are possible, producing a naturally controlled effect.
Download Improved hidden Markov model partial tracking through time-frequency analysis In this article we propose a modification to the combinatorial hidden Markov model developed in [1] for tracking partial frequency trajectories. We employ the Wigner-Ville distribution and Hough transform in order to (re)estimate the frequency and chirp rate of partials in each analysis frame. We estimate the initial phase and amplitude of each partial by minimizing the squared error in the time-domain. We then formulate a new scoring criterion for the hidden Markov model which makes the tracker more robust for non-stationary and noisy signals. We achieve good performance tracking crossing linear chirps and crossing FM signals in white noise as well as real instrument recordings.
Download Fast Partial Tracking of Audio with Real-Time Capability through Linear Programming This paper proposes a new partial tracking method, based on linear programming, that can run in real-time, is simple to implement, and performs well in difficult tracking situations by considering spurious peaks, crossing partials, and a non-stationary shortterm sinusoidal model. Complex constant parameters of a generalized short-term signal model are explicitly estimated to inform peak matching decisions. Peak matching is formulated as a variation of the linear assignment problem. Combinatorially optimal peak-to-peak assignments are found in polynomial time using the Hungarian algorithm. Results show that the proposed method creates high-quality representations of monophonic and polyphonic sounds.
Download REDS: A New Asymmetric Atom for Sparse Audio Decomposition and Sound Synthesis In this paper, we introduce a function designed specifically for sparse audio representations. A progression in the selection of dictionary elements (atoms) to sparsely represent audio has occurred: starting with symmetric atoms, then to damped sinusoid and hybrid atoms, and finally to the re-appropriation of the gammatone (GT) and formantwave-function (FOF) into atoms. These asymmetric atoms have already shown promise in sparse decomposition applications, where they prove to be highly correlated with natural sounds and musical audio, but since neither was originally designed for this application their utility remains limited. An in-depth comparison of each existing function was conducted based on application specific criteria. A directed design process was completed to create a new atom, the ramped exponentially damped sinusoid (REDS), that satisfies all desired properties: the REDS can adapt to a wide range of audio signal features and has good mathematical properties that enable efficient sparse decompositions and synthesis. Moreover, the REDS is proven to be approximately equal to the previous functions under some common conditions.