Download Object Coding of Harmonic Sounds Using Sparse and Structured Representations Object coding allows audio compression at extremely low bit-rates, provided that the objects are correctly modelled and identified. In this study, a codec has been implemented on the basis of a sparse decomposition of the signal with a dictionary of InstrumentSpecific Harmonic atoms. The decomposition algorithm extracts “molecules” i.e. linear combinations of such atoms, considered as note-like objects. Thus, they can be coded efficiently using notespecific strategies. For signals containing only harmonic sounds, the obtained bitrates are very low, typically around 2 kbs, and informal listening tests against a standard sinusoidal coder show promising performances.
Download Adaptive Threshold Determination for Spectral Peak Classification A new approach to adaptive threshold selection for classification of peaks of audio spectra is presented. We here extend the previous work on classification of sinusoidal and noise peaks based on a set of spectral peak descriptors in a twofold way: on one hand we propose a compact sinusoidal model where all the modulation parameters are defined with respect to the analysis window. This fact is of great importance as we recall that the STFT spectra are closely related to the analysis window properties. On the other hand, we design a threshold selection algorithm that allows us to control the decision thresholds in an intuitive manner. The decision thresholds calculated from the relationships established between the noise power in the signal and the distributions of sinusoidal peaks assures that all peaks described as sinusoidal will be correctly classified. We also show that the threshold selection algorithm can be used for different types of analysis windows with only a slight parameter readjustment.
Download Real-time Audio Processing via Segmented Wavelet Transform In audio applications it is often necessary to process the signal in “real time”. The method of segmented wavelet transform (SegWT) makes it possible to compute the discrete-time wavelet transform of a signal segment-by-segment, not using the classical “windowing”. This means that the method could be utilized for wavelettype processing of an audio signal in real time, or alternatively in case we just need to process a long signal, but there is insufficient computational memory capacity for it (e.g. in the DSPs). In the paper, the principle of the segmented forward wavelet transform is explained and the algorithm is described in detail.
Download Statistical Measures of Early Reflections of Room Impulse Responses An impulse response of an enclosed reverberant space is composed of three basic components: the direct sound, early reflections and late reverberation. While the direct sound is a single event that can be easily identified, the division between the early reflections and late reverberation is less obvious as there is a gradual transition between the two. This paper explores two statistical measures that can aid in determining a point in time where the early reflections have transitioned into late reverberation. These metrics exploit the similarities between late reverberation and Gaussian noise that are not commonly found in early reflections. Unlike other measures, these need no prior knowledge about the rooms such as geometry or volume.
Download Automatic Mixing: Live Downmixing Stereo Panner An automatic stereo panning algorithm intended for live multitrack downmixing has been researched. The algorithm uses spectral analysis to determine the panning position of sources. The method uses filter bank quantitative channel dependence, priority channel architecture and constrained rules to assign panning criteria. The algorithm attempts to minimize spectral masking by allocating similar spectra to different panning spaces. The algorithm has been implemented; results on its convergence, automatic panning space allocation, and left-right inter-channel phase relationship are presented.
Download Adaptive Harmonization and Pitch Correction of Polyphonic Audio Using Spectral Clustering There are several well known harmonization and pitch correction techniques that can be applied to monophonic sound sources. They are based on automatic pitch detection and frequency shifting without time stretching. In many applications it is desired to apply such effects on the dominant melodic instrument of a polyphonic audio mixture. However, applying them directly to the mixture results in artifacts, and automatic pitch detection becomes unreliable. In this paper we describe how a dominant melody separation method based on spectral clustering of sinusoidal peaks can be used for adaptive harmonization and pitch correction in mono polyphonic audio mixtures. Motivating examples from a violin tutoring perspective as well as modifying the saxophone melody of an old jazz mono recording are presented.
Download Modal Distribution Synthesis from Sub-Sampled Autocorrelation Function The problem of signal synthesis from bilinear time-frequency representations such as the Wigner distribution has been investigated [1,2,4] using methods which exploit an outer-product interpretation of these distributions. The Modal distribution is a timefrequency distribution specifically designed to model the quasiharmonic, multi-sinusoidal, nature of music signals and belongs to the Cohen general class of time-frequency distributions. Existing methods of synthesis from the Modal distribution [3] are based on a sinusoidal-analysis-synthesis procedure using estimates of instantaneous frequency and amplitude values. In this paper we develop an innovative synthesis procedure for the Modal distribution based on the outer-product interpretation of bilinear timefrequency distributions. We also propose a streaming objectoriented implementation of the resynthesis in the SndObj library [6] based on previous work which implemented a streaming implementation of the Modal distribution [7]. The theoretical background to the Modal distribution and to signal synthesis of Wigner distributions is first outlined followed by an explanation of the design and implementation of the Modal distribution synthesis. Suggestions for future extensions to the synthesis procedure are given.
Download Frequency Slope Estimation and its Application for Non-Stationary Sinusoidal Parameter Estimation In the following paper we investigate into the estimation of sinusoidal parameters for sinusoids with linear AM/FM modulation. It will be shown that for linear amplitude and frequency modulation only the frequency modulation creates additional estimation bias for the standard sinusoidal parameter estimator. Then an enhanced algorithm for frequency domain demodulation of spectral peaks is proposed that can be used to obtain an approximate maximum likelihood estimate of the frequency slope, and an estimate of the amplitude, phase and frequency parameter with significantly reduced bias. An experimental evaluation compares the new estimation scheme with previously existing methods. It shows that significant bias reduction is achieved for a large range of slopes and zero padding factors. A real world example demonstrates that the enhanced bias reduction algorithm can achieve a reduction of the residual energy of up to 9dB.
Download Realtime Multiple-Pitch and Multiple-Instrument Recognition for Music Signals Using Sparse Non-Negative Constraints In this paper we introduce a simple and fast method for realtime recognition of multiple pitches produced by multiple musical instruments. Our proposed method is based on two important facts: (1) that timbral information of any instrument is pitch-dependant and (2) that the modulation spectrum of the same pitch seems to result into a persistent representation of the characteristics of the instrumental family. Using these basic facts, we construct a learning algorithm to obtain pitch templates of all possible notes on various instruments and then devise an online algorithm to decompose a realtime audio buffer using the learned templates. The learning and decomposition proposed here are inspired by non-negative matrix factorization methods but differ by introduction of an explicit sparsity control. Our test results show promising recognition rates for a realtime system on real music recordings. We discuss further improvements that can be made over the proposed system.
Download Multipitch Estimation of Quasi-Harmonic Sounds in Colored Noise This paper proposes a new multipitch estimator based on a likelihood maximization principle. For each tone, a sinusoidal model is assumed with a colored, Moving-Average, background noise and an autoregressive spectral envelope for the overtones. A monopitch estimator is derived following a Weighted Maximum Likelihood principle and leads to find the fundamental frequency (F0 ) which jointly maximally flattens the noise spectrum and the sinusoidal spectrum. The multipitch estimator is obtained by extending the method for jointly estimating multiple F0 ’s. An application to piano tones is presented, which takes into account the inharmonicity of the overtone series for this instrument.