Download From Joint Stereo to Spatial Audio Coding - Recent Progress and Standardization
Within the evolution of perceptual audio coding, there is a long history of exploiting techniques for joint coding of several audio channels of an audio program which are presented simultaneously. The paper describes how such techniques have progressed over time into the recent concept of spatial audio coding, as it is under standardization currently within the ISO/MPEG group. As a significant improvement over conventional techniques, this approach allows the representation of high quality multi-channel audio at bitrates of only 64kbit/s and below.
Download Matching live sources with physical models
This paper investigates the use of a physical model template database as the parameter basis for a MPEG-4 Structured Audio (MP4-SA) codec. During analysis, the codec attempts to match the closest corresponding instrument in the database. In this paper, we emphasize the mechanism enabling this match. We give an overview of the final front end, including the pitch detection stage, and remaining problems are discussed. A draft implementation, written in the Python language is described.
Download Perceptually motivated parametric representation for harmonic sounds for data compression purposes
Download Object Coding of Harmonic Sounds Using Sparse and Structured Representations
Object coding allows audio compression at extremely low bit-rates, provided that the objects are correctly modelled and identified. In this study, a codec has been implemented on the basis of a sparse decomposition of the signal with a dictionary of InstrumentSpecific Harmonic atoms. The decomposition algorithm extracts “molecules” i.e. linear combinations of such atoms, considered as note-like objects. Thus, they can be coded efficiently using notespecific strategies. For signals containing only harmonic sounds, the obtained bitrates are very low, typically around 2 kbs, and informal listening tests against a standard sinusoidal coder show promising performances.
Download Efficient Parametric Modeling for Audio Transients
In this work, we present an evolution of the DDS (Damped & Delayed Sinusoidal) model introduced within the framework of the general signal modeling. This model is named Partial Damped & Delayed Sinusoidal (PDDS) model and takes into account a single time delay parameter for a set of (un)damped sinusoids. This modification is more consistent with the transient audio modeling problem. Then, we develop model parameter high-resolution estimation algorithms. Simulations on a typical transient audio signals show the validity of this approach.
Download EDS Parametric Modeling and Tracking of Audio Signals
Despite the success of parametric modeling in various fields of digital signal processing, the Fourier analysis remains a prominent tool for many audio applications. This paper aims at demonstrating the usefulness of the Exponentially Damped Sinusoidal (EDS) model both for analysis/synthesis and tracking purposes.
Download A system for data-driven concatenative sound synthesis
In speech synthesis, concatenative data-driven synthesis methods prevail. They use a database of recorded speech and a unit selection algorithm that selects the segments that match best the utterance to be synthesized. Transferring these ideas to musical sound synthesis allows a new method of high quality sound synthesis. Usual synthesis methods are based on a model of the sound signal. It is very difficult to build a model that would preserve the entire fine details of sound. Concatenative synthesis achieves this by using actual recordings. This data-driven approach (as opposed to a rule-based approach) takes advantage of the information contained in the many sound recordings. For example, very naturally sounding transitions can be synthesized, since unit selection is aware of the context of the database units. The C ATERPILLAR software system has been developed to allow data-driven concatenative unit selection sound synthesis. It allows high-quality instrument synthesis with high level control, explorative free synthesis from arbitrary sound databases, or resynthesis of a recording with sounds from the database. It is based on the new software-engineering concept of component-oriented software, increasing flexibility and facilitating reuse.
Download Assessing Applause Density Perception Using Synthesized Layered Applause Signals
Applause signals are the sound of many persons gathered in one place clapping their hands and are a prominent part of live music recordings. Usually, applause signals are recorded together or alongside with the live performance and serve to evoke the feeling of participation in a real event within the playback recipient. Applause signals can be very different in character, depending on the audience size, location, event type, and many other factors. To characterize different types of applause signals, the attribute of ‘density’ appears to be suitable. This paper reports first investigations whether density is an adequate perceptual attribute to describe different types of applause. We describe the design of a listening test assessing density and the synthesis of suitable, strictly controlled stimuli for the test. Finally, we provide results, both on strictly controlled and on naturally recorded stimuli, that confirm the suitability of the attribute density to describe important aspects of the perception of different applause signal characteristics.
Download A supervised learning approach to ambience extraction from mono recordings for blind upmixing
A supervised learning approach to ambience extraction from onechannel audio signals is presented. The extracted ambient signals are applied for the blind upmixing of musical audio recordings to surround sound formats. The input signal is processed by means of short-term spectral attenuation. The spectral weights are computed using a low-level feature extraction process and a neural network regression method. The multi-channel audio signal is generated by feeding the computed ambient signal into the rear channels of a surround sound system.
Download Complexity Scaling of Audio Algorithms: Parametrizing the MPEG Advanced Audio Coding Rate-Distortion Loop
Implementations of audio algorithms on embedded devices are required to consume minimal memory and processing power. Such applications can usually tolerate numerical imprecisions (distortion) as long as the resulting perceived quality is not degraded. By taking advantage of this error-tolerant nature the algorithmic complexity can be reduced greatly. In the context of real-time audio coding, these algorithms can benefit from parametrization to adapt rate-distortion-complexity (R-D-C) trade-offs. We propose a modification to the rate-distortion loop in the quantization and coding stage of a fixed-point implementation of the Advanced Audio Coding (AAC) encoder to include complexity scaling. This parametrization could allow the control of algorithmic complexity through instantaneous workload measurements using the target processor’s task scheduler to better assign processing resources. Results show that this framework can be tuned to reduce a significant amount of the additional workload caused by the ratedistortion loop while remaining perceptually equivalent to the fullcomplexity version. Additionally, the modification allows a graceful degradation when transparency cannot be met due to limited computational capabilities.