Download Differentiable IIR Filters for Machine Learning Applications In this paper we present an approach to using traditional digital IIR
filter structures inside deep-learning networks trained using backpropagation. We establish the link between such structures and
recurrent neural networks. Three different differentiable IIR filter
topologies are presented and compared against each other and an
established baseline. Additionally, a simple Wiener-Hammerstein
model using differentiable IIRs as its filtering component is presented and trained on a guitar signal played through a Boss DS-1
guitar pedal.
Download A Differentiable Digital Moog Filter For Machine Learning Applications In this project, a digital ladder filter has been investigated and expanded. This structure is a simplified digital analog model of the well known analog Moog ladder filter. The goal of this paper is to derive the differentiation expressions of this filter with respect to its control parameters in order to integrate it in machine learning systems. The derivation of the backpropagation method is described in this work, it can be generalized to a Moog filter or a similar filter having any number of stages. Subsequently, the example of an adaptive Moog filter is provided. Finally, a machine learning application example is shown where the filter is integrated in a deep learning framework.
Download Polyphonic music analysis by signal processing and support vector machines In this paper an original system for the analysis of harmony and polyphonic music is introduced. The system is based on signal processing and machine learning. A new multi-resolution, fast analysis method is conceived to extract time-frequency energy spectrum at the signal processing stage, while support vector machine is used as machine learning technology. Aiming at the analysis of rather general audio content, experiments are made on a huge set of recorded samples, using 19 music instruments combined together or alone, with different polyphony. Experimental results show that fundamental frequencies are detected with a remarkable success ratio and that the method can provide excellent results in general cases.
Download Introducing Deep Machine Learning for Parameter Estimation in Physical Modelling One of the most challenging tasks in physically-informed sound synthesis is the estimation of model parameters to produce a desired timbre. Automatic parameter estimation procedures have been developed in the past for some specific parameters or application scenarios but, up to now, no approach has been proved applicable to a wide variety of use cases. A general solution to parameters estimation problem is provided along this paper which is based on a supervised convolutional machine learning paradigm. The described approach can be classified as “end-to-end” and requires, thus, no specific knowledge of the model itself. Furthermore, parameters are learned from data generated by the model, requiring no effort in the preparation and labeling of the training dataset. To provide a qualitative and quantitative analysis of the performance, this method is applied to a patented digital waveguide pipe organ model, yielding very promising results.
Download Re-targeting Expressive Musical Style Using a Machine-Learning Method Expressive musical performing style involves more than what is simply represented on the score. Performers imprint their personal style on each performances based on their musical understanding. Expressive musical performing style makes the music come alive by shaping the music through continuous variation. It is observed that the musical style can be represented by appropriate numerical parameters, where most parameters are related to the dynamics. It is also observed that performers tends to perform music sections and motives of similar shape in similar ways, where music sections and motives can be identified by an automatic phrasing algorithm. An experiment is proposed for producing expressive music from raw quantized music files using machine-learning methods like Support Vector Machines. Experimental results show that it is possible to induce some of a performer’s style by using the music parameters extracted from the audio recordings of their real performance.
Download SCHAEFFER: A Dataset of Human-Annotated Sound Objects for Machine Learning Applications Machine learning for sound generation is rapidly expanding within
the computer music community. However, most datasets used to
train models are built from field recordings, foley sounds, instrumental notes, or commercial music. This presents a significant
limitation for composers working in acousmatic and electroacoustic music, who require datasets tailored to their creative processes.
To address this gap, we introduce the SCHAEFFER Dataset (Spectromorphological Corpus of Human-annotated Audio with Electroacoustic Features For Experimental Research), a curated collection of 1000 sound objects designed and annotated by composers and students of electroacoustic composition. The dataset,
distributed under Creative Commons licenses, features annotations
combining technical and poetic descriptions, alongside classifications based on pre-defined spectromorphological categories.
Download Grey-Box Modelling of Dynamic Range Compression This paper explores the digital emulation of analog dynamic range compressors, proposing a grey-box model that uses a combination of traditional signal processing techniques and machine learning. The main idea is to use the structure of a traditional digital compressor in a machine learning framework, so it can be trained end-to-end to create a virtual analog model of a compressor from data. The complexity of the model can be adjusted, allowing a trade-off between the model accuracy and computational cost. The proposed model has interpretable components, so its behaviour can be controlled more readily after training in comparison to a black-box model. The result is a model that achieves similar accuracy to a black-box baseline, whilst requiring less than 10% of the number of operations per sample at runtime.
Download Unsupervised Feature Learning for Speech and Music Detection in Radio Broadcasts Detecting speech and music is an elementary step in extracting information from radio broadcasts. Existing solutions either rely on general-purpose audio features, or build on features specifically engineered for the task. Interpreting spectrograms as images, we can apply unsupervised feature learning methods from computer vision instead. In this work, we show that features learned by a mean-covariance Restricted Boltzmann Machine partly resemble engineered features, but outperform three hand-crafted feature sets in speech and music detection on a large corpus of radio recordings. Our results demonstrate that unsupervised learning is a powerful alternative to knowledge engineering.
Download Network Bending of Diffusion Models for Audio-Visual Generation In this paper we present the first steps towards the creation of a tool which enables artists to create music visualizations using pretrained, generative, machine learning models. First, we investigate the application of network bending, the process of applying transforms within the layers of a generative network, to image generation diffusion models by utilizing a range of point-wise, tensorwise, and morphological operators. We identify a number of visual effects that result from various operators, including some that are not easily recreated with standard image editing tools. We find that this process allows for continuous, fine-grain control of image generation which can be helpful for creative applications. Next, we generate music-reactive videos using Stable Diffusion by passing audio features as parameters to network bending operators. Finally, we comment on certain transforms which radically shift the image and the possibilities of learning more about the latent space of Stable Diffusion based on these transforms.
Download Autoencoding Neural Networks as Musical Audio Synthesizers A method for musical audio synthesis using autoencoding neural networks is proposed. The autoencoder is trained to compress and reconstruct magnitude short-time Fourier transform frames. The autoencoder produces a spectrogram by activating its smallest hidden layer, and a phase response is calculated using real-time phase gradient heap integration. Taking an inverse short-time Fourier transform produces the audio signal. Our algorithm is light-weight when compared to current state-of-the-art audio-producing machine learning algorithms. We outline our design process, produce metrics, and detail an open-source Python implementation of our model.