Download Improved hidden Markov model partial tracking through time-frequency analysis
In this article we propose a modification to the combinatorial hidden Markov model developed in [1] for tracking partial frequency trajectories. We employ the Wigner-Ville distribution and Hough transform in order to (re)estimate the frequency and chirp rate of partials in each analysis frame. We estimate the initial phase and amplitude of each partial by minimizing the squared error in the time-domain. We then formulate a new scoring criterion for the hidden Markov model which makes the tracker more robust for non-stationary and noisy signals. We achieve good performance tracking crossing linear chirps and crossing FM signals in white noise as well as real instrument recordings.
Download MOSPALOSEP: A Platform for the Binaural Localization and Separation of Spatial Sounds using Models of Interaural Cues and Mixture Models
In this paper, we present the MOSPALOSEP platform for the localization and separation of binaural signals. Our methods use short-time spectra of the recorded binaural signals. Based on a parametric model of the binaural mix, we exploit the joint evaluation of interaural cues to derive the location of each time-frequency bin. Then we describe different approaches to establish localization: some based on an energy-weighted histogram in azimuth space, and others based on an unsupervised number of sources identification of Gaussian mixture model combined with the Minimum Description Length. In this way, we use the revealed Gaussian Mixture Model structure to identify the particular region dominated by each source in a multi-source mix. A bank of spatial masks allows the extraction of each source according to the posterior probability or to the Maximum Likelihood binary masks. An important condition is the Windowed-Disjoint Orthogonality of the sources in the time-frequency domain. We assess the source separation algorithms specifically on instruments mix, where this fundamental condition is not satisfied.
Download KRONOS ‐ A Vectorizing Compiler for Music DSP
This paper introduces Kronos, a vectorizing Just in Time compiler designed for musical programming systems. Its purpose is to translate abstract mathematical expressions into high performance computer code. Musical programming system design criteria are considered and a three-tier model of abstraction is presented. The low level expression Metalanguage used in Kronos is described, along with the design choices that facilitate powerful, yet transparent vectorization of the machine code.
Download Human Inspired Auditory Source Localization
This paper describes an approach for the localization of a sound source in the complete azimuth plane of an auditory scene using a movable human dummy head. A new localization approach which assumes that the sources are positioned on a circle around the listener is introduced and performs better than standard approaches for humanoid source localization like the Woodworth formula and the Freefield formula. Furthermore a localization approach based on approximated HRTFs is introduced and evaluated. Iterative variants of the algorithms enhance the localization accuracy and resolve specific localization ambiguities. In this way a localization blur of approximately three degrees is achieved which is comparable to the human localization blur. A front-back confusion allows a reliable localization of the sources in the whole azimuth plane in up to 98.43 % of the cases.
Download 3D Particle Systems for Audio Applications
Although particle systems are well know for their use in computer graphics, their application in sound is very rare or almost non-existent. This paper presents a conceptual model for the use of particle systems in audio applications, using a full rendering system with virtual microphones: several virtual particles are spread over a virtual 3D space, where each particle reproduces one of the available audio streams (or a modified version), and the overall sound is captured by virtual microphones. Such system can be used on several audio-related areas like sound design, 3D mixing, reverb/impulse response design, granular synthesis, audio up-mixing, and impulse response up-mixing.
Download Practical Empirical Mode Decomposition For Audio Synthesis
A new method of Synthesis by Analysis for multi-component signals of fast changing instantaneous attributes is introduced. It makes use of two recent developments for signal decomposition to obtain near mono-component signals whose instantaneous attributes can be used for synthesis. Furthermore, by extension and combination of both decomposition methods, the overall quality of the decomposition is shown to improve considerably.
Download Analysis of Sound Field Distribution for Room Acoustics: From the Point of View of Hardware Implementation
Analysis of sound field distribution is a data-intense and memory-intense application. To speed up calculation, an alternative solution is to implement the analysis algorithms by FPGA. This paper presents the related issues for FPGA based sound field analysis system from the point of view of hardware implementation. Compared with other algorithms, the OCTA-FDTD algorithm consumes 49 slices in FPGA, and the system updates 536.2 million elements per second. In system architecture, the system based on the parallel architecture benefits from fast computation since the sound pressures of all elements are obtained and updated at a clock cycle. But it consumes more hardware resources, and a small sound space is simulated by a FPGA chip. In contrast, the system based on the time-sharing architecture extends the simulated sound area by expense of computation speed since the sound pressures are calculated element by element.
Download Antialiasing in BBD Chips Using BLEP
Several methods exist in the literature to accurately simulate Bucket Brigade Device (BBD) chips, which are widely used in analog delay-based audio effects for their characteristic lo-fi sound, which is affected by noise, nonlinearities and aliasing. The latter is a desired quality, being typical of those chips. However, when simulating BBDs in a discrete-time domain environment, additional aliasing components occur that need to be suppressed. In this work, we propose a novel method that applies the Bandlimited Step (BLEP) technique, effectively minimizing aliasing artifacts introduced by the simulation. The paper provides some insights on the design of a BBD simulation using interpolation at the input for clock rate conversion and, most importantly, shows how BLEP can be effective in reducing unwanted aliasing artifacts. Interpolation is shown to have minor importance in the reduction of spurious components.
Download Guitar Tone Stack Modeling with a Neural State-Space Filter
In this work, we present a data-driven approach to modeling tone stack circuits in guitar amplifiers and distortion pedals. To this aim, the proposed modeling approach uses a feedforward fully connected neural network to predict the parameters of a coupledform state-space filter, ensuring the numerical stability of the resulting time-varying system. The neural network is conditioned on the tone controls of the target tone stack and is optimized jointly with the coupled-form state-space filter to match the target frequency response. To assess the proposed approach, we model three popular tone stack schematics with both matched-order and overparameterized filters and conduct an objective comparison with well-established approaches that use cascaded biquad filters. Results from the conducted experiments demonstrate improved accuracy of the proposed modeling approach, especially in the case of over-parameterized state-space filters while guaranteeing numerical stability. Our method can be deployed, after training, in realtime audio processors.
Download Visualization and calculation of the roughness of acoustical musical signals using the Synchronization Index Model (SIM)
The synchronization index model of sensory dissonance and roughness accounts for the degree of phase-locking to a particular frequency that is present in the neural patterns. Sensory dissonance (roughness) is defined as the energy of the relevant beating frequencies in the auditory channels with respect to the total energy. The model takes rate-code patterns at the level of the auditory nerve as input and outputs a sensory dissonance (roughness) value. The synchronization index model entails a straightforward visualization of the principles underlying sensory dissonance and roughness, in particular in terms of (i) roughness contributions with respect to cochlear mechanical filtering (on a Critical Band scale), and (ii) roughness contributions with respect to phase-locking synchrony (=the synchronization index for the relevant beating frequencies on a frequency scale). This paper presents the concept, and implementation of the synchronization index model and its application to musical scales.