Download Automatic Mixing: Live Downmixing Stereo Panner
An automatic stereo panning algorithm intended for live multitrack downmixing has been researched. The algorithm uses spectral analysis to determine the panning position of sources. The method uses filter bank quantitative channel dependence, priority channel architecture and constrained rules to assign panning criteria. The algorithm attempts to minimize spectral masking by allocating similar spectra to different panning spaces. The algorithm has been implemented; results on its convergence, automatic panning space allocation, and left-right inter-channel phase relationship are presented.
Download Adaptive Harmonization and Pitch Correction of Polyphonic Audio Using Spectral Clustering
There are several well known harmonization and pitch correction techniques that can be applied to monophonic sound sources. They are based on automatic pitch detection and frequency shifting without time stretching. In many applications it is desired to apply such effects on the dominant melodic instrument of a polyphonic audio mixture. However, applying them directly to the mixture results in artifacts, and automatic pitch detection becomes unreliable. In this paper we describe how a dominant melody separation method based on spectral clustering of sinusoidal peaks can be used for adaptive harmonization and pitch correction in mono polyphonic audio mixtures. Motivating examples from a violin tutoring perspective as well as modifying the saxophone melody of an old jazz mono recording are presented.
Download Modal Distribution Synthesis from Sub-Sampled Autocorrelation Function
The problem of signal synthesis from bilinear time-frequency representations such as the Wigner distribution has been investigated [1,2,4] using methods which exploit an outer-product interpretation of these distributions. The Modal distribution is a timefrequency distribution specifically designed to model the quasiharmonic, multi-sinusoidal, nature of music signals and belongs to the Cohen general class of time-frequency distributions. Existing methods of synthesis from the Modal distribution [3] are based on a sinusoidal-analysis-synthesis procedure using estimates of instantaneous frequency and amplitude values. In this paper we develop an innovative synthesis procedure for the Modal distribution based on the outer-product interpretation of bilinear timefrequency distributions. We also propose a streaming objectoriented implementation of the resynthesis in the SndObj library [6] based on previous work which implemented a streaming implementation of the Modal distribution [7]. The theoretical background to the Modal distribution and to signal synthesis of Wigner distributions is first outlined followed by an explanation of the design and implementation of the Modal distribution synthesis. Suggestions for future extensions to the synthesis procedure are given.
Download Frequency Slope Estimation and its Application for Non-Stationary Sinusoidal Parameter Estimation
In the following paper we investigate into the estimation of sinusoidal parameters for sinusoids with linear AM/FM modulation. It will be shown that for linear amplitude and frequency modulation only the frequency modulation creates additional estimation bias for the standard sinusoidal parameter estimator. Then an enhanced algorithm for frequency domain demodulation of spectral peaks is proposed that can be used to obtain an approximate maximum likelihood estimate of the frequency slope, and an estimate of the amplitude, phase and frequency parameter with significantly reduced bias. An experimental evaluation compares the new estimation scheme with previously existing methods. It shows that significant bias reduction is achieved for a large range of slopes and zero padding factors. A real world example demonstrates that the enhanced bias reduction algorithm can achieve a reduction of the residual energy of up to 9dB.
Download Realtime Multiple-Pitch and Multiple-Instrument Recognition for Music Signals Using Sparse Non-Negative Constraints
In this paper we introduce a simple and fast method for realtime recognition of multiple pitches produced by multiple musical instruments. Our proposed method is based on two important facts: (1) that timbral information of any instrument is pitch-dependant and (2) that the modulation spectrum of the same pitch seems to result into a persistent representation of the characteristics of the instrumental family. Using these basic facts, we construct a learning algorithm to obtain pitch templates of all possible notes on various instruments and then devise an online algorithm to decompose a realtime audio buffer using the learned templates. The learning and decomposition proposed here are inspired by non-negative matrix factorization methods but differ by introduction of an explicit sparsity control. Our test results show promising recognition rates for a realtime system on real music recordings. We discuss further improvements that can be made over the proposed system.
Download Multipitch Estimation of Quasi-Harmonic Sounds in Colored Noise
This paper proposes a new multipitch estimator based on a likelihood maximization principle. For each tone, a sinusoidal model is assumed with a colored, Moving-Average, background noise and an autoregressive spectral envelope for the overtones. A monopitch estimator is derived following a Weighted Maximum Likelihood principle and leads to find the fundamental frequency (F0 ) which jointly maximally flattens the noise spectrum and the sinusoidal spectrum. The multipitch estimator is obtained by extending the method for jointly estimating multiple F0 ’s. An application to piano tones is presented, which takes into account the inharmonicity of the overtone series for this instrument.
Download Efficient Description and Rendering of Complex Interactive Acoustic Scenes
Interactive environmental audio spatialization technology has become commonplace in personal computers and is migrating into portable entertainment platforms (including cell phones) and multiplayer game servers (virtual online worlds). While the primary current application of this technology is 3D game sound track rendering, it is ultimately necessary in the implementation of any personal or shared immersive virtual world (“virtual reality”). The successful development and deployment of such applications in new mobile or online platforms involves maximizing the plausibility of the synthetic 3D audio scene while minimizing the computational and memory footprint of the audio rendering engine. It also requires a flexible, standardized scene description model to facilate the development of applications targeting multiple platforms. This paper reviews a computationally efficient 3-D positional audio and spatial reverberation processing architecture for real-time virtual acoustics over headphones or loudspeakers, compatible with current interactive audio standards (including MPEG-4, OpenAL, JSR 234 and OpenSL ES).
Download 2nd Order Spherical Harmonic Spatial Encoding of Digital Waveguide Mesh Room Acoustic Models
The aim of this research is to provide a solution for listening to the acoustics of Digital Waveguide Mesh (DWM) modelled virtual acoustic spaces. The DWM is a numerical simulation technique that has shown to be appropriate for modelling the propogation of sound through air. Recent work has explored methods for spatially capturing a soundfield within a virtual acoustic space using spatially distributed receivers based on sound intensity probe theory. This technique is now extended to facilitate spatial encoding using second-order spherical harmonics. This is achieved through an array of pressure sensitive receivers arranged around a central reference point, with appropriate processing applied to obtain the second-order harmonic signals associated with Ambisonic encoding/decoding. The processed signals are tested using novel techniques in order to objectively assess their integrity for reproducing a faithful impression of the virtual soundfield over a multi-channel sound system.
Download Hyper-Dimensional Digital Waveguide Mesh for Reverberation Modeling
Characteristics of digital waveguide meshes with more than three physical dimensions are studied. Especially, the properties of a 4-D mesh are analyzed and compared to waveguide structures of lower dimensionalities. The hypermesh produces a response with a dense and irregular modal pattern at high frequencies, which is beneficial in modeling the reverberation of rooms or musical instrument bodies. In addition, it offers a high degree of decorrelation between output points selected at different locations, which is advantageous for multi-channel reverberation. The frequencydependent decay of the hypermesh response can be controlled using boundary filters introduced recently by one of the authors. Several hypermeshes can be effectively combined in a multirate system, in which each mesh produces reverberation on a finite frequency band. The paper presents two hypermesh application examples: the modeling of the impulse response of a lecture hall and the simulation of the response of a clavichord soundbox.
Download Ray Acoustics Using Computer Graphics Technology
The modeling of room acoustics and simulation of sound wave propagation remain a difficult and computationally expensive task. Two main techniques have evolved, with one focusing on a real physical - wave-oriented - sound propagation, while the other approximates sound waves as rays using raytracing techniques. Due to many advances in computer science, and especially computer graphics over the last decade, interactive 3D sound simulations for complex and dynamic environments are within reach. In this paper we analyze sound propagation in terms of acoustic energy and explore the possibilities to map these concepts to radiometry and graphics rendering equations. Although we concentrate on ray-based techniques, we also partially consider wavebased sound propagation effects. The implemented system exploits modern graphics hardware and rendering techniques and is able to efficiently simulate 3D room acoustics, as well as to measure simplified personal HRTFs through acoustic raytracing.