Download 3D interactive environment for music collection navigation Previous interfaces for large collections of music have used spatial audio to enhance the presentation of a visual interface or to add a mode of interaction. An interface using only audio information is presented here as a means to explore a large music collection in a two or three-dimensional space. By taking advantage of Ambisonics and binaural technology, the application presented here can scale to large collections, have flexible playback requirements, and can be optimized for slower computers. User evaluation reveals issues in creating an intuitive mapping between between user movements in physical space and virtual movement through the collection, but the novel presentation of the music collection has positive feedback and warrants further development.
Download Surround Sound without Rear Loudspeakers: Multichannel Compensated Amplitude Panning and Ambisonics Conventional panning approaches for surround sound require loudspeakers to be distributed over the regions where images are needed. However in many listening situations it is not practical or desirable to place loudspeakers some positions, such as behind or above the listener. Compensated Amplitude Panning (CAP) is a method that adapts dynamically to the listener’s head orientation to provide images in any direction, in the frequency range up to ⇡ 1000 Hz using only 2 loudspeakers. CAP is extended here for more loudspeakers, which removes some limitations and provides additional benefits. The new CAP method is also compared with an Ambisonics approach that is adapted for surround sound without rear loudspeakers.
Download Binaural HRTF-based Spatialization: New Approaches and Implementation New approaches to Head Related Transfer Function (HRTF) based artificial spatialisation of audio are presented and discussed in this paper. A brief summary of the topic of audio spatialisation and HRTF interpolation is offered, followed by an appraisal of the existing minimum phase HRTF interpolation method. Novel alternatives are then suggested which essentially approach the problem of phase interpolation more directly. The first technique, based on magnitude interpolation and phase truncation, aims to use the empirical HRTFs without the need for complex data preparation or manipulation, while minimizing any approximations that may be introduced by data transformations. A second approach augments a functionally based phase model with low frequency non-linear frequency scaling based on the empirical HRTFs, allowing a more accurate phase representation of the more relevant lower frequency end of the spectrum. This more complex approach is deconstructed from an implementation point of view. Testing of both algorithms is then presented, which highlights their success, and favorable performance over minimum phase plus delay methods.
Download Sound Source Separation in the Higher Order Ambisonics Domain In this article we investigate how the local Gaussian model (LGM) can be applied to separate sound sources in the higher-order ambisonics (HOA) domain. First, we show that in the HOA domain, the mathematical formalism of the local Gaussian model remains the same as in the microphone domain. Second, using an off-the shelf source separation toolbox (FASST) based on the local Gaussian model, we validate the efficiency of the approach in the HOA domain by comparing the performance of toolbox in the HOA domain with its performance in the microphone domain. To do this we discuss and run some simulations to ensure a fair comparison. Third, we check the efficiency of the local Gaussian model compared to other available source separation techniques in the HOA domain. Simulation results show that separating sources in the HOA domain results in a 1 to 12 dB increase in signal-to-distortion ratio, compared to the microphone domain. Multichannel source separation, local Gaussian model, Wiener filtering, 3D audio, Higher Order Ambisonics (HOA).
Download Sound Matching Using Synthesizer Ensembles Sound matching allows users to automatically approximate existing sounds using a synthesizer. Previous work has mostly focused on algorithms for automatically programming an existing synthesizer. This paper proposes a system for selecting between different synthesizer designs, each one with a corresponding automatic programmer. An implementation that allows designing ensembles based on a template is demonstrated. Several experiments are presented using a simple subtractive synthesis design. Using an ensemble of synthesizer-programmer pairs is shown to provide better matching than a single programmer trained for an equivalent integrated synthesizer. Scaling to hundreds of synthesizers is shown to improve match quality.
Download Time-Varying Filter Stability and State Matrix Products We show a new sufficient criterion for time-varying digital filter stability: that the matrix norm of the product of state matrices over a certain finite number of time steps is bounded by 1. This extends Laroche’s Criterion 1, which only considered one time step, while hinting at extensions to two time steps. Further extending these results, we also show that there is no intrinsic requirement that filter coefficients be frozen over any time scale, and extend to any dimension a helpful theorem that allows us to avoid explicitly performing eigen- or singular value decompositions in studying the matrix norm. We give a number of case studies on filters known to be time-varying stable, that cannot be proven time-varying stable with the original criterion, where the new criterion succeeds.
Download Event-Synchronous Music Synthesis This work presents a novel framework for music synthesis, based on the perceptual structure analysis of pre-existing musical signals, for example taken from a personal MP3 database. We raise the important issue of grounding music analysis on perception, and propose a bottom-up approach to music analysis, as well as modeling, and synthesis. A model of segmentation for polyphonic signals is described, and is qualitatively validated through several artifact-free music resynthesis experiments, e.g., reversing the ordering of sound events (notes), without reversing their waveforms. Then, a compact “timbre” structure analysis, and a method for song description in the form of an “audio DNA” sequence is presented. Finally, we propose novel applications, such as music cross-synthesis, or time-domain audio compression, enabled through simple sound similarity measures, and clustering.
Download Cue Point Processing: An Introduction Modern digital sound formats such as aiff, mpeg1/2/4/7, wav and ra support the use of cue points. A cue point may also be referred to as a seek point or a key frame. These mechanisms store meta data about the sound files. In this paper, we identity and describe how these formats are encoded, and process meta data information with a focus on cue points. Finally, we conclude with the direction of our future research for improving multimedia browsing mechanisms and additional applications by leveraging the use of cue points within those applications. Keywords: sound file formats, cue points, sound file, audio files, seek point, key frame, audio indexing
Download A FPGA‐based Adaptive Noise Cancelling System A FPGA-based system suitable for augmented reality audio applications is presented. The sample application described here is adaptive noise cancellation (ANC). The system consists of a Spartan -3 FPGA XC3S400 board connected to a Philips Stereo-AudioCodec UCB 1400. The algorithms for the FIR filtering and for the adaption of the filter coefficients according to the Widrow-Hoff LMS algorithm are implemented on the FPGA board. Measurement results obtained with a dummy head measuring system are reported, and a detailed analysis of system performance and possible system improvements is given.
Download The Modulation Scale Spectrum and its Application to Rhythm-Content Description In this paper, we propose the Modulation Scale Spectrum as an extension of the Modulation Spectrum through the Scale domain. The Modulation Spectrum expresses the evolution over time of the amplitude content of various frequency bands by a second Fourier Transform. While its use has been proven for many applications, it is not scale-invariant. Because of this, we propose the use of the Scale Transform instead of the second Fourier Transform. The Scale Transform is a special case of the Mellin Transform. Among its properties is "scale-invariance". This implies that two timestretched version of a same music track will have (almost) the same Scale Spectrum. Our proposed Modulation Scale Spectrum therefore inherits from this property while describing frequency content evolution over time. We then propose a specific implementation of the Modulation Scale Spectrum in order to represent rhythm content. This representation is therefore tempo-independent. We evaluate the ability of this representation to catch rhythm characteristics on a classification task. We demonstrate that for this task our proposed representation largely exceeds results obtained so far while being highly tempo-independent.