Download Low Complexity Parametric Stereo Coding in MPEG-4
Parametric stereo coding in combination with a State-of-the-Art coder for the underlying monaural audio signal results in the most ef cient coding scheme for stereo signals at very low bit rates available today. This paper reviews those aspects of the parametric stereo paradigm that are important for audio coding applications. A complete parametric stereo coding system is presented, which was recently standardized in MPEG-4 Audio. Using complex modulated lter banks, it allows implementation with low computational complexity. The system is backward compatible and enables high quality stereo coding at total bit rate of 24 kbit/s when used in combination with aacPlus.
Download Cue Point Processing: An Introduction
Modern digital sound formats such as aiff, mpeg1/2/4/7, wav and ra support the use of cue points. A cue point may also be referred to as a seek point or a key frame. These mechanisms store meta data about the sound files. In this paper, we identity and describe how these formats are encoded, and process meta data information with a focus on cue points. Finally, we conclude with the direction of our future research for improving multimedia browsing mechanisms and additional applications by leveraging the use of cue points within those applications. Keywords: sound file formats, cue points, sound file, audio files, seek point, key frame, audio indexing
Download Decorrelation for Immersive Audio Applications and Sound Effects
Audio decorrelation is a fundamental building block for immersive audio applications. It has applications in parametric spatial audio coding, audio upmix, audio sound effects and audio rendering for virtual or augmented reality applications. In this paper, we provide insights into the practical design considerations of an audio decorrelator on the example of the decorrelator contained within the upcoming MPEG-I Immersive Audio ISO standard [1]. We describe the desirable properties of such a decorrelator, common approaches for implementation and our particular technology choices for the decorrelator used in MPEG-I for rendering sound sources with homogeneous extent.
Download Spatial Audio Object Coding with Enhanced Audio Object Separation
Spatial sound reproduction on a multi-channel loudspeaker setup indicate a consistent trend in today’s audio playback systems. Digital surround sound significantly improves the realism of the spatial sound experience, but also results in a drastic increase in required audio data rate. Spatial Audio Coding (SAC) technology provides means for efficient storage and transmission of multi-channel signals by a downmix signal and associated parametric side information describing the spatial sound image. More recently, SAC has been extended with an object-based concept termed Spatial Audio Object Coding (SAOC) enabling efficient coding and interactive spatial rendering of multiple individual audio objects at the playback side. Due to the underlying parametric coding approach, object level manipulations may affect the produced perceptual sound scene quality, and using extreme object attenuation or boosting may result in unacceptably degraded audio quality. The paper describes how regular SAOC processing is advanced to ensure high quality sound reproduction even in demanding remix applications.
Download Fast Sinusoid Synthesis for MPEG-4 HILN Parametric Audio Decoding
Additive sinusoidal synthesis is a popular technique for applications like sound synthesis or very low bit rate parametric audio decoding. In this paper, different algorithms for the efficient synthesis of sinusoids on general purpose CPUs as found in today’s PCs are investigated. Fast algorithms for time domain synthesis of constant and linearly changing frequencies are presented and compared to frequency domain synthesis approaches. Execution time and accuracy (SNR) of the algorithms are reported for different CPU types. Finally, the algorithms are implemented in a fast MPEG-4 HILN parametric audio decoder in order to evaluate their performance in a real world application.
Download Parametric Coding of Spatial Audio
Recently, there has been a renewed interest in techniques for coding of stereo and multi-channel audio signals. Stereo and multichannel audio signals evoke an auditory spatial image in a listener. Thus, in addition to pure redundancy reduction, a receiver model which considers properties of spatial hearing may be used for reducing the bitrate. This has been done in previous techniques by considering the importance of interaural level difference cues at high frequencies and by considering the binaural masking level difference when computing the masked threshold for multiple audio channels. Recently, a number of more systematic and parameterized such techniques were introduced. In this paper an overview over a technique, denoted binaural cue coding (BCC), is given. BCC represents stereo or multichannel audio signals as a single or more downmixed audio channels plus side information. The side information contains the interchannel cues inherent in the original audio signal that are relevant for the perception of the properties of the auditory spatial image. The relation between the inter-channel cues and attributes of the auditory spatial image is discussed. Other applications of BCC are discussed, such as joint-coding of independent audio signals providing flexibility at the decoder to mix arbitrary stereo, multichannel, and binaural signals.
Download Assessing the Quality of the Extraction and Tracking of Sinusoidal Components: Towards an Evaluation Methodology
In this paper, we introduce two original evaluation methods in the context of sinusoidal modeling. The first one assesses the quality of the extraction of sinusoidal components from short-time signals, whereas the second one focuses on the quality of the tracking of these sinusoidal components over time. Each proposed method intends to use a unique cost function that globally reflects the performance of the tested algorithm in a realistic framework. Clearly defined evaluation protocols are then proposed with several test cases to evaluate most of the desired properties of extractors or trackers of sinusoidal components. This paper is a first proposal to be used as a starting point in a sinusoidal analysis / synthesis contest to be held at DAFx’07.
Download A Complex Envelope Sinusoidal Model for Audio Coding
A modification to the hybrid sinusoidal model is proposed for the purpose of high-quality audio coding. In our proposal the amplitude envelope of each harmonic partial is modeled by a narrowband complex signal. Such representation incorporates most of the signal energy associated with sinusoidal components, including that related to frequency estimation and quantization errors. It also takes into account the natural width of each spectral line. The advantages of such model extension are a more straightforward and robust representation of the deterministic component and a clean stochastic residual without ghost sinusoids. The reconstructed signal is virtually free from harmonic artifacts and more natural sounding. We propose to encode the complex envelopes by the means of MCLT transform coefficients with coefficient interleave across partials within an MPEG-like coding scheme. We show some experimental results with high compression efficiency achieved.
Download Signal Decorrelation using Perceptually Informed Allpass Filters
When a monophonic source signal is projected from two or more loudspeakers, listeners typically perceive a single, phantom source, positioned according to the relative signal amplitudes and speaker locations. While this property is the basis of modern panning algorithms, it is often desirable to control the perceived spatial extent of the phantom source, or to project multiple, separately perceived copies of the signal. So that the human auditory system does not process the loudspeaker outputs as a single coherent source, these effects are commonly achieved by generating a set of mutually decorrelated (e.g., statistically independent) versions of the source signal, which are then panned to make an extended source or multiple, independent source copies. In this paper, we introduce an approach to decorrelation using randomly generated allpass filters, and introduce numerical methods for evaluating the perceptual effectiveness of decorrelation algorithms. By using allpass filters, the signal magnitude is preserved, and the decorrelated copies and original signal will be perceptually very similar. By randomly selecting the magnitude and frequency of the poles of each allpass biquad section in the decorrelating filter, multiple decorrelating filters may be generated that maintain a degree of statistical independence. We present results comparing our approach (including methods for choosing the number of biquad sections and designing the statistics of the pole locations) to several established decorrelation methods discussed in the literature.
Download Fast Additive Sound Synthesis Using Polynomials
This paper presents a new fast sound synthesis method using polynomials. This is an additive method, where polynomials are used to approximate sine functions. Traditional additive synthesis requires each sample to be generated for each partial oscillator. Then all these partial samples are summed up to obtain the resulting sound sample, thus making the synthesis time proportional to the product of the number of oscillators and the sampling rate. By using polynomial approximations, we instead sum up only the oscillator coefficients and then generate directly the sound sample from these new coefficients. Most of computation time is consumed by a data structure that manages the update of the generator coefficients as a priority queue. Practical implementations show that Polynomial Additive Sound Synthesis (PASS) is particularly efficient for low-frequency signals.