Download Improved Cocktail-Party Processing
The human auditory system is able to focus on one speech signal and ignore other speech signals in an auditory scene where several conversations are taking place. This ability of the human auditory system is referred to as the “cocktail-party effect”. This property of human hearing is partly made possible by binaural listening. Interaural time differences (ITDs) and interaural level differences (ILDs) between the ear input signals are the two most important binaural cues for localization of sound sources, i.e. the estimation of source azimuth angles. This paper proposes an implementation of a cocktail-party processor. The proposed cocktail-party processor carries out an auditory scene analysis by estimating the binaural cues corresponding to the directions of the sources. And next, as a function of these cues, suppresses components of signals arriving from non-desired directions, by speech enhancement techniques. The performance of the proposed algorithm is assessed in terms of directionality and speech quality. The proposed algorithm improves existing cocktail-party processors since it combines low computational complexity and efficient source separation. Moreover the advantage of this cocktailparty processor over conventional beam forming is that it enables a highly directional beam over a wide frequency range by using only two microphones.
Download A modified FM synthesis approach to bandlimited signal generation
Techniques for the generation of bandlimited signals for application to digital implementations of subtractive synthesis have been researched by a number of authors. This paper hopes to contribute to the variety of approaches by proposing a technique based on Frequency Modulation (FM) synthesis. This paper presents and explains the equations required for bandlimited pulse generation using modified FM synthesis. It then investigates the relationships between the modulation index and the quality of the reproduction in terms of authenticity and aliasing for a sawtooth wave. To determine the performance of this technique in comparison to others two sets of simulation results are offered: the first computes the relative power of the non-harmonic components, and the second uses the Perceptual Evaluation of Audio Quality (PEAQ) algorithm. It is shown that this technique compares well with the alternatives. The paper concludes with suggestions for the direction of future improvements to the method.
Download Human Inspired Auditory Source Localization
This paper describes an approach for the localization of a sound source in the complete azimuth plane of an auditory scene using a movable human dummy head. A new localization approach which assumes that the sources are positioned on a circle around the listener is introduced and performs better than standard approaches for humanoid source localization like the Woodworth formula and the Freefield formula. Furthermore a localization approach based on approximated HRTFs is introduced and evaluated. Iterative variants of the algorithms enhance the localization accuracy and resolve specific localization ambiguities. In this way a localization blur of approximately three degrees is achieved which is comparable to the human localization blur. A front-back confusion allows a reliable localization of the sources in the whole azimuth plane in up to 98.43 % of the cases.
Download Analysis and Correction of Maps Dataset
Automatic music transcription (AMT) is the process of converting the original music signal into the digital music symbol. The MIDI Aligned Piano Sounds (MAPS) dataset was established in 2010 and is the most used benchmark dataset for automatic piano music transcription. In this paper, error screening is carried out through algorithm strategy, and three data annotation problems are found in ENSTDkCl, which is a subset of MAPS, usually used for algorithm evaluation: (1) there are 342 deviation errors of midi annotation; (2) there are 803 unplayed note errors; (3) there are 1613 slow starting process errors. After algorithm correction and manual confirmation, the corrected dataset is released. Finally, the better-performing Google model and our model are evaluated on the corrected dataset. The F values are 85.94% and 85.82%, respectively, and it is correspondingly improved compared with the original dataset, which proves that the correction of the dataset is meaningful.
Download Spherical Decomposition of Arbitrary Scattering Geometries for Virtual Acoustic Environments
A method is proposed to encode the acoustic scattering of objects for virtual acoustic applications through a multiple-input and multiple-output framework. The scattering is encoded as a matrix in the spherical harmonic domain, and can be re-used and manipulated (rotated, scaled and translated) to synthesize various sound scenes. The proposed method is applied and validated using Boundary Element Method simulations which shows accurate results between references and synthesis. The method is compatible with existing frameworks such as Ambisonics and image source methods.
Download Realtime system for backing vocal harmonization
A system for the synthesis of backing vocals by pitch shifting of a lead vocal signal is presented. The harmonization of the backing vocals is based on the chords which are retrieved from an accompanying instrument. The system operates completely autonomous without the need to provide the key of the performed song. This simplifies the handling of the harmonization effect. The system is designed to have realtime capability to be used as live sound effect.
Download Sparse Atomic Modeling of Audio: a Review
Research into sparse atomic models has recently intensified in the image and audio processing communities. While other reviews exist, we believe this paper provides a good starting point for the uninitiated reader as it concisely summarizes the state-of-the-art, and presents most of the major topics in an accessible manner. We discuss several approaches to the sparse approximation problem including various greedy algorithms, iteratively re-weighted least squares, iterative shrinkage, and Bayesian methods. We provide pseudo-code for several of the algorithms, and have released software which includes fast dictionaries and reference implementations for many of the algorithms. We discuss the relevance of the different approaches for audio applications, and include numerical comparisons. We also illustrate several audio applications of sparse atomic modeling.
Download An Efficient Algorithm for Real-Time Spectrogram Inversion
We present a computationally efficient real-time algorithm for constructing audio signals from spectrograms. Spectrograms consist of a time sequence of short time Fourier transform magnitude (STFTM) spectra. During the audio signal construction process, phases are derived for the individual frequency components so that the spectrogram of the constructed signal is as close as possible to the target spectrogram given real-time constraints. The algorithm is a variation of the classic Griffin and Lim [1] technique modified to be computable in real-time. We discuss the application of the algorithm to time-scale modification of audio signals such as speech and music, and performance is compared with other methods. The new algorithm generates comparable or better results with significantly less computation. The phase consistency between adjacent frames produces excellent subjective sound quality with minimal fame transition artifacts.
Download Onset Detection Revisited
Various methods have been proposed for detecting the onset times of musical notes in audio signals. We examine recent work on onset detection using spectral features such as the magnitude, phase and complex domain representations, and propose improvements to these methods: a weighted phase deviation function and a halfwave rectified complex difference. These new algorithms are compared with several state-of-the-art algorithms from the literature, and these are tested using a standard data set of short excerpts from a range of instruments (1060 onsets), plus a much larger data set of piano music (106054 onsets). Some of the results contradict previously published results and suggest that a similarly high level of performance can be obtained with a magnitude-based (spectral flux), a phase-based (weighted phase deviation) or a complex domain (complex difference) onset detection function.
Download Sound transformation by descriptor using an analytic domain
In many applications of sound transformation, such as sound design, mixing, mastering, and composition the user interactively searches for appropriate parameters. However, automatic applications of sound transformation, such as mosaicing, may require choosing parameters without user intervention. When the target can be specified by its synthesis context, or by example (from features of the example), “adaptive effects” can provide such control. But there exist few general strategies for building adaptive effects from arbitrary sets of transformations and descriptor targets. In this study, we decouple the usually direct link between analysis and transformation in adaptive effects, attempting to include more diverse transformations and descriptors in adaptive transformation, if at the cost of additional complexity or difficulty. We build an analytic model of a deliberately simple transformation-descriptor (TD) domain, and show some preliminary results.