Download Sound Source Separation: Preprocessing For Hearing Aids And Structured Audio Codin
In this paper we consider the problem of separating different sound sources in multichannel audio signals. Different approaches to the problem of Blind Source Separation (BSS), e.g. the Independent Component Analysis (ICA) originally proposed by Herault and Jutten, and extensions to this including delays, work fine for artificially mixed signals. However the quality of the separated signals is severely degraded for real sound recordings when there is reverberation. We consider the system with 2 sources and 2 sensors, and show how we can improve the quality of the separation by a simple model of the audio scene. More specifically we estimate the delays between the sensor signals, and put constraints on the deconvolution coefficients.
Download An Extension for Source Separation Techniques Avoiding Beats
The problem of separating individual sound sources from a mixture of these, known as Source Separation or Computational Auditory Scene Analysis (CASA), has become popular in the recent decades. A number of methods have emerged from the study of this problem, some of which perform very well for certain types of audio sources, e.g. speech. For separation of instruments in music, there are several shortcomings. In general when instruments play together they are not independent of each other. More specifically the time-frequency distributions of the different sources will overlap. Harmonic instruments in particular have high probability of overlapping partials. If these overlapping partials are not separated properly, the separated signals will have a different sensation of roughness, and the separation quality degrades. In this paper we present a method to separate overlapping partials in stereo signals. This method looks at the shapes of partial envelopes, and uses minimization of the difference between such shapes in order to demix overlapping partials. The method can be applied to enhance existing methods for source separation, e.g. blind source separation techniques, model based techniques, and spatial separation techniques. We also discuss other simpler methods that can work with mono signals.
Download On the use of spatial cues to improve binaural source separation
Motivated by the human hearing sense we devise a computational model suitable for the localization of many sources in stereo signals, and apply this to the separation of sound sources. The method employs spatial cues in order to resolve high-frequency phase ambiguities. More specifically we use relationships between the short time Fourier transforms (STFT) of the two signals in order to estimate the two most important spatial cues, namely time differences (TD) and level differences (LD) between the sensors. By using models of both free field wave propagation and head related transfer functions (HRTF), these cues are combined to form estimates of spatial parameters such as the directions of arrival (DOA). The theory is validated with the help of the experimental results presented in the paper.
Download Binaural source localization
In binaural signals, interaural time differences (ITDs) and interaural level differences (ILDs) are two of the most important cues for the estimation of source azimuths, i.e. the localization of sources in the horizontal plane. For narrow band signals, according to the duplex theory, ITD is dominant at low frequencies and ILD is dominant at higher frequencies. Based on the STFT spectra of binaural signals, a method is proposed for the combined evaluation of ITD and ILD for each individual spectral coefficient. ITD and ILD are related to the azimuth through lookup models. Azimuth estimates based on ITD are more accurate but ambiguous at higher frequencies due to phase wrapping. The less accurate but unambiguous azimuth estimates based on ILDs are used in order to select the closest candidate azimuth estimates based on ITDs, effectively improving the azimuth estimation. The method corresponds well with the duplex theory and also handles the transition from low to high frequencies gracefully. The relations between the ITD and ILD and the azimuth are computed from a measured set of head related transfer functions (HRTFs), yielding azimuth lookup models. Based on a study of these models for different subjects, parametric azimuth lookup models are proposed. The parameters of these models can be optimized for an individual subject whose HRTFs have been measured. In addition, subject independent lookup models are proposed, parametrized only by the distance between the ears, effectively enabling source localization for subjects whose HRTFs have not been measured.