Download Learning Nonlinear Dynamics in Physical Modelling Synthesis Using Neural Ordinary Differential Equations Modal synthesis methods are a long-standing approach for modelling distributed musical systems. In some cases extensions are
possible in order to handle geometric nonlinearities. One such
case is the high-amplitude vibration of a string, where geometric nonlinear effects lead to perceptually important effects including pitch glides and a dependence of brightness on striking amplitude. A modal decomposition leads to a coupled nonlinear system of ordinary differential equations. Recent work in applied machine learning approaches (in particular neural ordinary differential equations) has been used to model lumped dynamic systems
such as electronic circuits automatically from data. In this work,
we examine how modal decomposition can be combined with neural ordinary differential equations for modelling distributed musical systems. The proposed model leverages the analytical solution
for linear vibration of system’s modes and employs a neural network to account for nonlinear dynamic behaviour. Physical parameters of a system remain easily accessible after the training without
the need for a parameter encoder in the network architecture. As
an initial proof of concept, we generate synthetic data for a nonlinear transverse string and show that the model can be trained to
reproduce the nonlinear dynamics of the system. Sound examples
are presented.
Download Audio-Based Gesture Extraction on the ESITAR Controller Using sensors to extract gestural information for control parameters of digital audio effects is common practice. There has also been research using machine learning techniques to classify specific gestures based on audio feature analysis. In this paper, we will describe our experiments in training a computer to map the appropriate audio-based features to look like sensor data, in order to potentially eliminate the need for sensors. Specifically, we will show our experiments using the ESitar, a digitally enhanced sensor based controller modeled after the traditional North Indian sitar. We utilize multivariate linear regression to map continuous audio features to continuous gestural data.
Download Automatic Music Detection in Television Productions This paper presents methods for the automatic detection of music within audio streams, in the fore- or background. The problem occurs in the context of a real-world application, namely, the analysis of TV productions w.r.t. the use of music. In contrast to plain speech/music discrimination, the problem of detecting music in TV productions is extremely difficult, since music is often used to accentuate scenes while concurrently speech and any kind of noise signals might be present. We present results of extensive experiments with a set of standard machine learning algorithms and standard features, investigate the difference between frame-level and clip-level features, and demonstrate the importance of the application of smoothing functions as a post-processing step. Finally, we propose a new feature, called Continuous Frequency Activation (CFA), especially designed for music detection, and show experimentally that this feature is more precise than the other approaches in identifying segments with music in audio streams.
Download Identification of individual guitar sounds by support vector machines This paper introduces an automatic classification system for the identification of individual classical guitars by single notes played on these guitars. The classification is performed by Support Vector Machines (SVM) that have been trained with the features of the single notes. The features used for classification were the time series of the partial tones, the time series of the MFCCs (Mel Frequency Cepstral Coefficients), and the “nontonal” contributions to the spectrum. The influences of these features on the classification success are reported. With this system, 80% of the sounds recorded with three different guitars were classified correctly. A supplementary classification experiment was carried out with human listeners resulting in a rate of 65% of correct classifications.
Download Re-targeting Expressive Musical Style Using a Machine-Learning Method Expressive musical performing style involves more than what is simply represented on the score. Performers imprint their personal style on each performances based on their musical understanding. Expressive musical performing style makes the music come alive by shaping the music through continuous variation. It is observed that the musical style can be represented by appropriate numerical parameters, where most parameters are related to the dynamics. It is also observed that performers tends to perform music sections and motives of similar shape in similar ways, where music sections and motives can be identified by an automatic phrasing algorithm. An experiment is proposed for producing expressive music from raw quantized music files using machine-learning methods like Support Vector Machines. Experimental results show that it is possible to induce some of a performer’s style by using the music parameters extracted from the audio recordings of their real performance.
Download Differentiable IIR Filters for Machine Learning Applications In this paper we present an approach to using traditional digital IIR
filter structures inside deep-learning networks trained using backpropagation. We establish the link between such structures and
recurrent neural networks. Three different differentiable IIR filter
topologies are presented and compared against each other and an
established baseline. Additionally, a simple Wiener-Hammerstein
model using differentiable IIRs as its filtering component is presented and trained on a guitar signal played through a Boss DS-1
guitar pedal.
Download A Hierarchical Deep Learning Approach for Minority Instrument Detection Identifying instrument activities within audio excerpts is vital in music information retrieval, with significant implications for music cataloging and discovery. Prior deep learning endeavors in musical instrument recognition have predominantly emphasized instrument classes with ample data availability. Recent studies have demonstrated the applicability of hierarchical classification in detecting instrument activities in orchestral music, even with limited fine-grained annotations at the instrument level. Based on the Hornbostel-Sachs classification, such a hierarchical classification system is evaluated using the MedleyDB dataset, renowned for its diversity and richness concerning various instruments and music genres. This work presents various strategies to integrate hierarchical structures into models and tests a new class of models for hierarchical music prediction. This study showcases more reliable coarse-level instrument detection by bridging the gap between detailed instrument identification and group-level recognition, paving the way for further advancements in this domain.
Download Hubness-Aware Outlier Detection for Music Genre Recognition Outlier detection is the task of automatic identification of unknown data not covered by training data (e.g. a new genre in genre recognition). We explore outlier detection in the presence of hubs and anti-hubs, i.e. data objects which appear to be either very close or very far from most other data due to a problem of measuring distances in high dimensions. We compare a classic distance based method to two new approaches, which have been designed to counter the negative effects of hubness, on two standard music genre data sets. We demonstrate that anti-hubs are responsible for many detection errors and that this can be improved by using a hubness-aware approach.
Download Neural Net Tube Models for Wave Digital Filters Herein, we demonstrate the use of neural nets towards simulating multiport nonlinearities inside a wave digital filter. We introduce a resolved wave definition which allows us to extract features from a Kirchhoff domain dataset and train our neural networks directly in the wave domain. A hyperparameter search is performed to minimize error and runtime complexity. To illustrate the method, we model a tube amplifier circuit inspired by the preamplifier stage of the Fender Pro-Junior guitar amplifier. We analyze the performance of our neural nets models by comparing their distortion characteristics and transconductances. Our results suggest that activation function selection has a significant effect on the distortion characteristic created by the neural net.
Download Reservoir Computing: a powerful Framework for Nonlinear Audio Processing This paper proposes reservoir computing as a general framework for nonlinear audio processing. Reservoir computing is a novel approach to recurrent neural network training with the advantage of a very simple and linear learning algorithm. It can in theory approximate arbitrary nonlinear dynamical systems with arbitrary precision, has an inherent temporal processing capability and is therefore well suited for many nonlinear audio processing problems. Always when nonlinear relationships are present in the data and time information is crucial, reservoir computing can be applied. Examples from three application areas are presented: nonlinear system identification of a tube amplifier emulator algorithm, nonlinear audio prediction, as necessary in a wireless transmission of audio where dropouts may occur, and automatic melody transcription out of a polyphonic audio stream, as one example from the big field of music information retrieval. Reservoir computing was able to outperform state-of-the-art alternative models in all studied tasks.