Download Reservoir Computing: a powerful Framework for Nonlinear Audio Processing
This paper proposes reservoir computing as a general framework for nonlinear audio processing. Reservoir computing is a novel approach to recurrent neural network training with the advantage of a very simple and linear learning algorithm. It can in theory approximate arbitrary nonlinear dynamical systems with arbitrary precision, has an inherent temporal processing capability and is therefore well suited for many nonlinear audio processing problems. Always when nonlinear relationships are present in the data and time information is crucial, reservoir computing can be applied. Examples from three application areas are presented: nonlinear system identification of a tube amplifier emulator algorithm, nonlinear audio prediction, as necessary in a wireless transmission of audio where dropouts may occur, and automatic melody transcription out of a polyphonic audio stream, as one example from the big field of music information retrieval. Reservoir computing was able to outperform state-of-the-art alternative models in all studied tasks.
Download New Method for Analysis and Modeling of Nonlinear Audio Systems
In this paper a new method for analysis and modeling of nonlinear audio systems is presented. The method is based on swept-sine excitation signal and nonlinear convolution firstly presented in [1, 2]. It can be used in nonlinear processing for audio applications, to simulate analog nonlinear effects (distortion effects, limiters) in digital domain.
Download Finding Latent Sources in Recorded Music with a Shift-invariant HDP
We present the Shift-Invariant Hierarchical Dirichlet Process (SIHDP), a nonparametric Bayesian model for modeling multiple songs in terms of a shared vocabulary of latent sound sources. The SIHDP is an extension of the Hierarchical Dirichlet Process (HDP) that explicitly models the times at which each latent component appears in each song. This extension allows us to model how sound sources evolve over time, which is critical to the human ability to recognize and interpret sounds. To make inference on large datasets possible, we develop an exact distributed Gibbs sampling algorithm to do posterior inference. We evaluate the SIHDP’s ability to model audio using a dataset of real popular music, and measure its ability to accurately find patterns in music using a set of synthesized drum loops. Ultimately, our model produces a rich representation of a set of songs consisting of a set of short sound sources and when they appear in each song.
Download Blind Separation of Monaural Signals using Complex Wavelets
In this paper, a new method of blind source separation of monaural signals is presented. It is based on similarity criteria between envelopes and frequency trajectories of the components of the signal, and on its onset and offset times. The main difference with previous works is that in this paper, the input signal has been filtered using a flexible complex band pass filter bank that is a discrete version of the Complex Continuous Wavelet Transform (CCWT). Our main purpose is to show that the CCWT can be a powerful tool in blind separation, due to its strong coherence in both time and frequency domains. The presented separation algorithm is a first approximation to this important task. An example set of four synthetically mixed monaural signals have been analyzed by this method. The obtained results are promising.
Download Human Inspired Auditory Source Localization
This paper describes an approach for the localization of a sound source in the complete azimuth plane of an auditory scene using a movable human dummy head. A new localization approach which assumes that the sources are positioned on a circle around the listener is introduced and performs better than standard approaches for humanoid source localization like the Woodworth formula and the Freefield formula. Furthermore a localization approach based on approximated HRTFs is introduced and evaluated. Iterative variants of the algorithms enhance the localization accuracy and resolve specific localization ambiguities. In this way a localization blur of approximately three degrees is achieved which is comparable to the human localization blur. A front-back confusion allows a reliable localization of the sources in the whole azimuth plane in up to 98.43 % of the cases.
Download A FPGA‐based Adaptive Noise Cancelling System
A FPGA-based system suitable for augmented reality audio applications is presented. The sample application described here is adaptive noise cancellation (ANC). The system consists of a Spartan -3 FPGA XC3S400 board connected to a Philips Stereo-AudioCodec UCB 1400. The algorithms for the FIR filtering and for the adaption of the filter coefficients according to the Widrow-Hoff LMS algorithm are implemented on the FPGA board. Measurement results obtained with a dummy head measuring system are reported, and a detailed analysis of system performance and possible system improvements is given.
Download Source-Filter based Clustering for Monaural Blind Source Separation
In monaural blind audio source separation scenarios, a signal mixture is usually separated into more signals than active sources. Therefore it is necessary to group the separated signals to the final source estimations. Traditionally grouping methods are supervised and thus need a learning step on appropriate training data. In contrast, we discuss unsupervised clustering of the separated channels by Mel frequency cepstrum coefficients (MFCC). We show that replacing the decorrelation step of the MFCC by the non-negative matrix factorization improves the separation quality significantly. The algorithms have been evaluated on a large test set consisting of melodies played with different instruments, vocals, speech, and noise.
Download Local Key estimation Based on Harmonic and Metric Structures
In this paper, we present a method for estimating the local keys of an audio signal. We propose to address the problem of local key finding by investigating the possible combination and extension of different previous proposed global key estimation approaches. The specificity of our approach is that we introduce key dependency on the harmonic and the metric structures. In this work, we focus on the relationship between the chord progression and the local key progression in a piece of music. A contribution of our work is that we address the problem of finding a good analysis window length for local key estimation by introducing information related to the metric structure in our model. Key estimation is not performed on empirical-chosen segment length but on segments that are adapted to the analyzed piece and independent from the tempo. We evaluate and analyze our results on a new database composed of classical music pieces.