Download Decoding Sound Source Location From EEG: Preliminary Comparisons of Spatial Rendering and Location
Spatial auditory acuity is contingent on the quality of spatial cues presented during listening. Electroencephalography (EEG) shows promise for finding neural markers of such acuity present in recorded neural activity, potentially mitigating common challenges with behavioural assessment (e.g., sound source localisation tasks). This study presents findings from three preliminary experiments which investigated neural response variations to auditory stimuli under different spatial listening conditions: free-field (loudspeakerbased), individual Head-Related Transfer-Functions (HRTF), and non-individual HRTFs. Three participants, each participating in one experiment, were exposed to auditory stimuli from various spatial locations while neural activity was recorded via EEG. The resultant neural responses underwent a decoding protocol to asses how decoding accuracy varied between stimuli locations over time. Decoding accuracy was highest for free-field auditory stimuli, with significant but lower decoding accuracy between left and right hemisphere locations for individual and non-individual HRTF stimuli. A latency in significant decoding accuracy was observed between listening conditions for locations dominated by spectral cues. Furthermore, findings suggest that decoding accuracy between free-field and non-individual HRTF stimuli may reflect behavioural front-back confusion rates.
Download Binaural In-Ear Monitoring of Acoustic Instruments in Live Music Performance
A method for Binaural In-Ear Monitoring (Binaural IEM) of acoustic instruments in live music is presented. Spatial rendering is based on four considerations: the directional radiation patterns of musical instruments, room acoustics, binaural synthesis with Head-Related Transfer Functions (HRTF), and the movements of both the musician’s head and instrument. The concepts of static and dynamic sound mixes are presented and discussed according to the emotional involvement and musical instruments of the performers, as well as the use of motion capture technology. Pilot experiments of BIEM with dynamic mixing were done with amateur musicians performing with wireless headphones and a motion capture system in a small room. Listening tests with professional musicians evaluating recordings under conditions of dynamic sound mixing were carried out, attempting to find an initial reaction to BIEM. Ideas for further research in static sound mixing, individualized HRTFs, tracking techniques, as well as wedge-monitoring schemes are suggested.
Download Notes on the use of Variational Autoencoders for Speech and Audio Spectrogram Modeling
Variational autoencoders (VAEs) are powerful (deep) generative artificial neural networks. They have been recently used in several papers for speech and audio processing, in particular for the modeling of speech/audio spectrograms. In these papers, very poor theoretical support is given to justify the chosen data representation and decoder likelihood function or the corresponding cost function used for training the VAE. Yet, a nice theoretical statistical framework exists and has been extensively presented and discussed in papers dealing with nonnegative matrix factorization (NMF) of audio spectrograms and its application to audio source separation. In the present paper, we show how this statistical framework applies to VAE-based speech/audio spectrogram modeling. This provides the latter insights on the choice and interpretability of data representation and model parameterization.
Download Balancing Error and Latency of Black-Box Models for Audio Effects Using Hardware-Aware Neural Architecture Search
In this paper, we address automating and systematizing the process of finding black-box models for virtual analogue audio effects with an optimal balance between error and latency. We introduce a multi-objective optimization approach based on hardware-aware neural architecture search which allows specifying the optimization balance of model error and latency according to the requirements of the application. By using a regularized evolutionary algorithm, it is able to navigate through a huge search space systematically. Additionally, we propose a search space for modelling non-linear dynamic audio effects consisting of over 41 trillion different WaveNet-style architectures. We evaluate its performance and usefulness by yielding highly effective architectures, either up to 18× faster or with a test loss of up to 56% less than the best performing models of the related work, while still showing a favourable trade-off. We can conclude that hardware-aware neural architecture search is a valuable tool that can help researchers and engineers developing virtual analogue models by automating the architecture design and saving time by avoiding manual search and evaluation through trial-and-error.
Download Identification of Nonlinear Circuits as Port-Hamiltonian Systems
This paper addresses identification of nonlinear circuits for power-balanced virtual analog modeling and simulation. The proposed method combines a port-Hamiltonian system formulation with kernel-based methods to retrieve model laws from measurements. This combination allows for the estimated model to retain physical properties that are crucial for the accuracy of simulations, while representing a variety of nonlinear behaviors. As an illustration, the method is used to identify a nonlinear passive peaking EQ.
Download KRONOS ‐ A Vectorizing Compiler for Music DSP
This paper introduces Kronos, a vectorizing Just in Time compiler designed for musical programming systems. Its purpose is to translate abstract mathematical expressions into high performance computer code. Musical programming system design criteria are considered and a three-tier model of abstraction is presented. The low level expression Metalanguage used in Kronos is described, along with the design choices that facilitate powerful, yet transparent vectorization of the machine code.
Download The Sounding Gesture: An Overview
Sound control by gesture is a peculiar topic in Human-Computer Interaction: many different approaches to it are available, focusing each time on diversified perspectives. Our point of view is an interdisciplinary one: taking into account technical considerations about control theory and sound processing, we try to explore the expressiveness world which is closer to psychology theories. Starting from a state of the art which outlines two main approaches to the problem of ”making sound with gestures”, we will delve into psychological theories about expressiveness, describing in particular possible applications dealing with intermodality and mixed reality environments related to the Gestalt Theory. HCI design can indeed benefit from this kind of approach because of the quantitative methods that can be applied to measure expressiveness. Interfaces can be used in order to convey expressiveness, which is a plus of information that can help interacting with the machine; this kind of information can be coded as spatio-temporal schemes, as it is stated in Gestalt theory.
Download Multimodal Interfaces for Expressive Sound Control
This paper introduces research issues on multimodal interaction and interfaces for expressive sound control. We introduce Multisensory Integrated Expressive Environments (MIEEs) as a framework for Mixed Reality applications in the performing arts. Paradigmatic contexts for applications of MIEEs are multimedia concerts, interactive dance / music / video installations, interactive museum exhibitions, distributed cooperative environments for theatre and artistic expression. MIEEs are user-centred systems able to interpret the high-level information conveyed by performers through their expressive gestures and to establish an effective multisensory experience taking into account expressive, emotional, affective content. The lecture discusses some main issues for MIEEs and presents the EyesWeb (www.eyesweb.org) open software platform which has been recently redesigned (version 4) in order to better address MIEE requirements. Short live demonstrations are also presented.
Download The Shape of RemiXXXes to Come: Audio Texture Synthesis with Time-frequency Scattering
This article explains how to apply time–frequency scattering, a convolutional operator extracting modulations in the time–frequency domain at different rates and scales, to the re-synthesis and manipulation of audio textures. After implementing phase retrieval in the scattering network by gradient backpropagation, we introduce scale-rate DAFx, a class of audio transformations expressed in the domain of time–frequency scattering coefficients. One example of scale-rate DAFx is chirp rate inversion, which causes each sonic event to be locally reversed in time while leaving the arrow of time globally unchanged. Over the past two years, our work has led to the creation of four electroacoustic pieces: FAVN; Modulator (Scattering Transform); Experimental Palimpsest; Inspection (Maida Vale Project) and Inspection II; as well as XAllegroX (Hecker Scattering.m Sequence), a remix of Lorenzo Senni’s XAllegroX, released by Warp Records on a vinyl entitled The Shape of RemiXXXes to Come.
Download Local Key estimation Based on Harmonic and Metric Structures
In this paper, we present a method for estimating the local keys of an audio signal. We propose to address the problem of local key finding by investigating the possible combination and extension of different previous proposed global key estimation approaches. The specificity of our approach is that we introduce key dependency on the harmonic and the metric structures. In this work, we focus on the relationship between the chord progression and the local key progression in a piece of music. A contribution of our work is that we address the problem of finding a good analysis window length for local key estimation by introducing information related to the metric structure in our model. Key estimation is not performed on empirical-chosen segment length but on segments that are adapted to the analyzed piece and independent from the tempo. We evaluate and analyze our results on a new database composed of classical music pieces.