Download Human Inspired Auditory Source Localization
This paper describes an approach for the localization of a sound source in the complete azimuth plane of an auditory scene using a movable human dummy head. A new localization approach which assumes that the sources are positioned on a circle around the listener is introduced and performs better than standard approaches for humanoid source localization like the Woodworth formula and the Freefield formula. Furthermore a localization approach based on approximated HRTFs is introduced and evaluated. Iterative variants of the algorithms enhance the localization accuracy and resolve specific localization ambiguities. In this way a localization blur of approximately three degrees is achieved which is comparable to the human localization blur. A front-back confusion allows a reliable localization of the sources in the whole azimuth plane in up to 98.43 % of the cases.
Download On the window-disjoint-orthogonality of speech sources in reverberant humanoid scenarios
Many speech source separation approaches are based on the assumption of orthogonality of speech sources in the time-frequency domain. The target speech source is demixed from the mixture by applying the ideal binary mask to the mixture. The time-frequency orthogonality of speech sources is investigated in detail only for anechoic and artificially mixed speech mixtures. This paper evaluates how the orthogonality of speech sources decreases when using a realistic reverberant humanoid recording setup and indicates strategies to enhance the separation capabilities of algorithms based on ideal binary masks under these conditions. It is shown that the SIR of the target source demixed from the mixture using the ideal binary mask decreases by approximately 3 dB for reverberation times of T60 = 0.6 s opposed to the anechoic scenario. For humanoid setups, the spatial distribution of the sources and the choice of the correct ear channel introduces differences in the SIR of further 3 dB, which leads to specific strategies to choose the best channel for demixing.