Download Reservoir Computing: a powerful Framework for Nonlinear Audio Processing
This paper proposes reservoir computing as a general framework for nonlinear audio processing. Reservoir computing is a novel approach to recurrent neural network training with the advantage of a very simple and linear learning algorithm. It can in theory approximate arbitrary nonlinear dynamical systems with arbitrary precision, has an inherent temporal processing capability and is therefore well suited for many nonlinear audio processing problems. Always when nonlinear relationships are present in the data and time information is crucial, reservoir computing can be applied. Examples from three application areas are presented: nonlinear system identification of a tube amplifier emulator algorithm, nonlinear audio prediction, as necessary in a wireless transmission of audio where dropouts may occur, and automatic melody transcription out of a polyphonic audio stream, as one example from the big field of music information retrieval. Reservoir computing was able to outperform state-of-the-art alternative models in all studied tasks.
Download The development of an online course in DSP eartraining
The authors present a collaborative effort on establishing an online course in DSP eartraining. The paper reports from a preliminary workshop that covered a large range of topics such as eartraining in music education, terminology for sound characterization, e-learning, automated tutoring, DSP techniques, music examples and audio programming. An initial design of the web application is presented as a rich content database with flexible views to allow customized online presentations. Technical risks have already been mitigated through prototyping.
Download A Model of Partial Tracks for Tension-Modulated Steel-String Guitar Tones
This paper introduces a spectral model for plucked, steel string tones, based on functional models for time-varying fundamental frequency and inharmonicity coefficient. Techniques to evaluate those analytical values at different time indexes are reviewed and commented. A method to evaluate the unknowns of the fundamental frequency and inharmonicity coefficient functions and match the data of a given tone is presented. Frequency tracks can thereafter be deployed and traced for all values of time. Their accuracy is discussed, and applications for the model are suggested.
Download An Algorithm for a Valved Brass Instrument Synthesis Environment using Finite-Difference Time-Domain Methods with Performance Optimisation
This paper presents a physical modelling sound synthesis environment for the production of valved brass instrument sounds. The governing equations of the system are solved using finite-difference time-domain (FDTD) methods and the environment is implemented in the C programming language. Users of the environment can create their own custom instruments and are able to control player parameters such as lip frequency, mouth pressure and valve openings through the use of instrument and score files. The algorithm for sound synthesis is presented in detail along with a discussion of optimisation methods used to reduce run time. Binaries for the environment are available for download online for multiple platforms.
Download NBU: Neural Binaural Upmixing of Stereo Content
While immersive music productions have become popular in recent years, music content produced during the last decades has been predominantly mixed for stereo. This paper presents a datadriven approach to automatic binaural upmixing of stereo music. The network architecture HDemucs, previously utilized for both source separation and binauralization, is leveraged for an endto-end approach to binaural upmixing. We employ two distinct datasets, demonstrating that while custom-designed training data enhances the accuracy of spatial positioning, the use of professionally mixed music yields superior spatialization. The trained networks show a capacity to process multiple simultaneous sources individually and add valid binaural cues, effectively positioning sources with an average azimuthal error of less than 11.3 ◦ . A listening test with binaural experts shows it outperforms digital signal processing-based approaches to binauralization of stereo content in terms of spaciousness while preserving audio quality.