Download Modulation and demodulation of steerable ultrasound beams for audio transmission and rendering Nonlinear effects in ultrasound propagation can be used for generating highly directive audible sound. In order to do so, we can modulate the amplitude of the audio signal and send it to an ultrasound transducer. When played back at a sufficiently high sound pressure level, due to a nonlinear behavior of the medium, the ultrasonic signal gets self-demodulated. The resulting signal has two important characteristics: that of becoming audible; and that of having the same directivity properties of the ultrasonic carrier frequency. In this paper we describe the theoretical advantages of singlesideband (SSB) modulation versus a standard amplitude modulation (AM) scheme for the above-described application. We describe our near-field soundfield measuring experiments, and propose steering solutions for the array using two different types of transducers, piezoelectric or electrostatic, and the proper supporting hardware.
Download Improved hidden Markov model partial tracking through time-frequency analysis In this article we propose a modification to the combinatorial hidden Markov model developed in [1] for tracking partial frequency trajectories. We employ the Wigner-Ville distribution and Hough transform in order to (re)estimate the frequency and chirp rate of partials in each analysis frame. We estimate the initial phase and amplitude of each partial by minimizing the squared error in the time-domain. We then formulate a new scoring criterion for the hidden Markov model which makes the tracker more robust for non-stationary and noisy signals. We achieve good performance tracking crossing linear chirps and crossing FM signals in white noise as well as real instrument recordings.
Download 3D interactive environment for music collection navigation Previous interfaces for large collections of music have used spatial audio to enhance the presentation of a visual interface or to add a mode of interaction. An interface using only audio information is presented here as a means to explore a large music collection in a two or three-dimensional space. By taking advantage of Ambisonics and binaural technology, the application presented here can scale to large collections, have flexible playback requirements, and can be optimized for slower computers. User evaluation reveals issues in creating an intuitive mapping between between user movements in physical space and virtual movement through the collection, but the novel presentation of the music collection has positive feedback and warrants further development.
Download Multiple-F0 tracking based on a high-order HMM model This paper is about multiple-F0 tracking and the estimation of the number of harmonic source streams in music sound signals. A source stream is understood as generated from a note played by a musical instrument. A note is described by a hidden Markov model (HMM) having two states: the attack state and the sustain state. It is proposed to first perform the tracking of F0 candidates using a high-order hidden Markov model, based on a forward-backward dynamic programming scheme. The propagated weights are calculated in the forward tracking stage, followed by an iterative tracking of the most likely trajectories in the backward tracking stage. Then, the estimation of the underlying source streams is carried out by means of iteratively pruning the candidate trajectories in a maximum likelihood manner. The proposed system is evaluated by a specially constructed polyphonic music database. Compared with the frame-based estimation systems, the tracking mechanism improves significantly the accuracy rate.
Download On the numerical solution of the 2D wave equation with compact FDTD schemes This paper discusses compact-stencil nite difference time domain (FDTD) schemes for approximating the 2D wave equation in the context of digital audio. Stability, accuracy, and efciency are investigated and new ways of viewing and interpreting the results are discussed. It is shown that if a tight accuracy constraint is applied, implicit schemes outperform explicit schemes. The paper also discusses the relevance to digital waveguide mesh modelling, and highlights the optimally efcient explicit scheme.
Download A supervised learning approach to ambience extraction from mono recordings for blind upmixing A supervised learning approach to ambience extraction from onechannel audio signals is presented. The extracted ambient signals are applied for the blind upmixing of musical audio recordings to surround sound formats. The input signal is processed by means of short-term spectral attenuation. The spectral weights are computed using a low-level feature extraction process and a neural network regression method. The multi-channel audio signal is generated by feeding the computed ambient signal into the rear channels of a surround sound system.
Download Robustness and independence of voice timbre features under live performance acoustic degradations Live performance situations can lead to degradations in the vocal signal from a typical microphone, such as ambient noise or echoes due to feedback. We investigate the robustness of continuousvalued timbre features measured on vocal signals (speech, singing, beatboxing) under simulated degradations. We also consider nonparametric dependencies between features, using information theoretic measures and a feature-selection algorithm. We discuss how robustness and independence issues reflect on the choice of acoustic features for use in constructing a continuous-valued vocal timbre space. While some measures (notably spectral crest factors) emerge as good candidates for such a task, others are poor, and some features such as ZCR exhibit an interaction with the type of voice signal being analysed.
Download On the window-disjoint-orthogonality of speech sources in reverberant humanoid scenarios Many speech source separation approaches are based on the assumption of orthogonality of speech sources in the time-frequency domain. The target speech source is demixed from the mixture by applying the ideal binary mask to the mixture. The time-frequency orthogonality of speech sources is investigated in detail only for anechoic and artificially mixed speech mixtures. This paper evaluates how the orthogonality of speech sources decreases when using a realistic reverberant humanoid recording setup and indicates strategies to enhance the separation capabilities of algorithms based on ideal binary masks under these conditions. It is shown that the SIR of the target source demixed from the mixture using the ideal binary mask decreases by approximately 3 dB for reverberation times of T60 = 0.6 s opposed to the anechoic scenario. For humanoid setups, the spatial distribution of the sources and the choice of the correct ear channel introduces differences in the SIR of further 3 dB, which leads to specific strategies to choose the best channel for demixing.