DAFx Paper Archive - Search for machine learning, page 2 of 32

Towards Neural Emulation of Voltage-Controlled Oscillators

DAFx-2025 - Ancona

Machine learning models have become ubiquitous in modeling analog audio devices. Expanding on this line of research, our study focuses on Voltage-Controlled Oscillators of analog synthesizers. We employ black box autoregressive artificial neural networks to model the typical analog waveshapes, including triangle, square, and sawtooth. The models can be conditioned on wave frequency and type, enabling the generation of pitch envelopes and morphing across waveshapes. We conduct evaluations on both synthetic and analog datasets to assess the accuracy of various architectural variants. The LSTM variant performed better, although lower frequency ranges present particular challenges.

Download

Audio-Based Gesture Extraction on the ESITAR Controller

Ajay Kapur; George Tzanetakis; Peter Driessen

DAFx-2004 - Naples

Using sensors to extract gestural information for control parameters of digital audio effects is common practice. There has also been research using machine learning techniques to classify specific gestures based on audio feature analysis. In this paper, we will describe our experiments in training a computer to map the appropriate audio-based features to look like sensor data, in order to potentially eliminate the need for sensors. Specifically, we will show our experiments using the ESitar, a digitally enhanced sensor based controller modeled after the traditional North Indian sitar. We utilize multivariate linear regression to map continuous audio features to continuous gestural data.

Download

LTFATPY: Towards Making a Wide Range of Time-Frequency Representations Available in Python

Clara Hollomey; Nicki Holighaus

DAFx-2024 - Guildford

LTFATPY is a software package for accessing the Large Time Frequency Analysis Toolbox (LTFAT) from Python. Dedicated to time-frequency analysis, LTFAT comprises a large number of linear transforms for Fourier, Gabor, and wavelet analysis along with their associated operators. Its filter bank module is a collection of computational routines for finite impulse response and band-limited filters, allowing for the specification of constant-Q and auditory-inspired transforms. While LTFAT has originally been written in MATLAB/GNU Octave, the recent popularity of the Python programming language in related fields, such as signal processing and machine learning, makes it desirable to have LTFAT available in Python as well. We introduce LTFATPY, describe its main features, and outline further developments.

Download

Music Emotion Classification: Dataset Acquisition And Comparative Analysis

Renato Panda; Rui Pedro Paiva

DAFx-2012 - York

In this paper we present an approach to emotion classification in audio music. The process is conducted with a dataset of 903 clips and mood labels, collected from Allmusic1 database, organized in five clusters similar to the dataset used in the MIREX2 Mood Classification Task. Three different audio frameworks – Marsyas, MIR Toolbox and Psysound, were used to extract several features. These audio features and annotations are used with supervised learning techniques to train and test various classifiers based on support vector machines. To access the importance of each feature several different combinations of features, obtained with feature selection algorithms or manually selected were tested. The performance of the solution was measured with 20 repetitions of 10-fold cross validation, achieving a F-measure of 47.2% with precision of 46.8% and recall of 47.6%.

Download

Towards an Invertible Rhythm Representation

Aggelos Gkiokas; Stefan Lattner; Vassilis Katsouros; Arthur Flexer; George Carayanni

DAFx-2015 - Trondheim

This paper investigates the development of a rhythm representation of music audio signals, that (i) is able to tackle rhythm related tasks and, (ii) is invertible, i.e. is suitable to reconstruct audio from it with the corresponding rhythm content being preserved. A conventional front-end processing schema is applied to the audio signal to extract time varying characteristics (accent features) of the signal. Next, a periodicity analysis method is proposed that is capable of reconstructing the accent features. Afterwards, a network consisting of Restricted Boltzmann Machines is applied to the periodicity function to learn a latent representation. This latent representation is finally used to tackle two distinct rhythm tasks, namely dance style classification and meter estimation. The results are promising for both input signal reconstruction and rhythm classification performance. Moreover, the proposed method is extended to generate random samples from the corresponding classes.

Download

Physical Modeling Using Recurrent Neural Networks with Fast Convolutional Layers

Julian D. Parker; Sebastian J. Schlecht; Rudolf Rabenstein; Maximilian Schäfer

DAFx-2022 - Vienna

Discrete-time modeling of acoustic, mechanical and electrical systems is a prominent topic in the musical signal processing literature. Such models are mostly derived by discretizing a mathematical model, given in terms of ordinary or partial differential equations, using established techniques. Recent work has applied the techniques of machine-learning to construct such models automatically from data for the case of systems which have lumped states described by scalar values, such as electrical circuits. In this work, we examine how similar techniques are able to construct models of systems which have spatially distributed rather than lumped states. We describe several novel recurrent neural network structures, and show how they can be thought of as an extension of modal techniques. As a proof of concept, we generate synthetic data for three physical systems and show that the proposed network structures can be trained with this data to reproduce the behavior of these systems.

Download

Differentiable grey-box modelling of phaser effects using frame-based spectral processing

Alistair Carson; Cassia Valentini-Botinhao; Simon King; Stefan Bilbao

DAFx-2023 - Copenhagen

Machine learning approaches to modelling analog audio effects have seen intensive investigation in recent years, particularly in the context of non-linear time-invariant effects such as guitar amplifiers. For modulation effects such as phasers, however, new challenges emerge due to the presence of the low-frequency oscillator which controls the slowly time-varying nature of the effect. Existing approaches have either required foreknowledge of this control signal, or have been non-causal in implementation. This work presents a differentiable digital signal processing approach to modelling phaser effects in which the underlying control signal and time-varying spectral response of the effect are jointly learned. The proposed model processes audio in short frames to implement a time-varying filter in the frequency domain, with a transfer function based on typical analog phaser circuit topology. We show that the model can be trained to emulate an analog reference device, while retaining interpretable and adjustable parameters. The frame duration is an important hyper-parameter of the proposed model, so an investigation was carried out into its effect on model accuracy. The optimal frame length depends on both the rate and transient decay-time of the target effect, but the frame length can be altered at inference time without a significant change in accuracy.

Download

Automatic Music Detection in Television Productions

Klaus Seyerlehner; Tim Pohle; Markus Schedl; Gerhard Widmer

DAFx-2007 - Bordeaux

This paper presents methods for the automatic detection of music within audio streams, in the fore- or background. The problem occurs in the context of a real-world application, namely, the analysis of TV productions w.r.t. the use of music. In contrast to plain speech/music discrimination, the problem of detecting music in TV productions is extremely difficult, since music is often used to accentuate scenes while concurrently speech and any kind of noise signals might be present. We present results of extensive experiments with a set of standard machine learning algorithms and standard features, investigate the difference between frame-level and clip-level features, and demonstrate the importance of the application of smoothing functions as a post-processing step. Finally, we propose a new feature, called Continuous Frequency Activation (CFA), especially designed for music detection, and show experimentally that this feature is more precise than the other approaches in identifying segments with music in audio streams.

Download

Beat-Marker Location using a Probabilistic Framework and Linear Discriminant Analysis

Geoffroy Peeters

DAFx-2009 - Como

This paper deals with the problem of beat-tracking in an audiofile. Considering time-variable tempo and meter estimation as input, we study two beat-tracking approaches. The first one is based on an adaptation of a method used in speech processing for locating the Glottal Closure Instants. The results obtained with this first approach allow us to derive a set of requirements for a robust approach. This second approach is based on a probabilistic framework. In this approach the beat-tracking problem is formulated as an “inverse” Viterbi decoding problem in which we decode times over beat-numbers according to observation and transition probabilities. A beat-template is used to derive the observation probabilities from the signal. For this task, we propose the use of a machine-learning method, the Linear Discriminant Analysis, to estimate the most discriminative beat-template. We finally propose a set of measures to evaluate the performances of a beattracking algorithm and perform a large-scale evaluation of the two approaches on four different test-sets.

Download

Partiels – Exploring, Analyzing and Understanding Sounds

Pierre Guillot

DAFx-2025 - Ancona

This article presents Partiels, an open-source application developed at IRCAM to analyze digital audio files and explore sound characteristics. The application uses Vamp plug-ins to extract various information on different aspects of the sound, such as spectrum, partials, pitch, tempo, text, and chords. Partiels is the successor to AudioSculpt, offering a modern, flexible interface for visualizing, editing, and exporting analysis results, addressing a wide range of issues from musicological practice to sound creation and signal processing research. The article describes Partiels’ key features, including analysis organization, audio file management, results visualization and editing, as well as data export and sharing options, and its interoperability with other software such as Max and Pure Data. In addition, it highlights the numerous analysis plug-ins developed at IRCAM, based in particular on machine learning models, as well as the IRCAM Vamp extension, which overcomes certain limitations of the original Vamp format.

Download

Proceedings of the International Conference on Digital Audio Effects (DAFx)

Proc. Int. Conf. Digital Audio Effects (DAFx)

Paper Archive

Years

Authors