DAFx Paper Archive - Search for machine learning, page 22 of 32

Hyper Recurrent Neural Network: Condition Mechanisms for Black-Box Audio Effect Modeling

Yen-Tung Yeh; Wen-Yi Hsiao; Yi-Hsuan Yang

DAFx-2024 - Guildford

Recurrent neural networks (RNNs) have demonstrated impressive results for virtual analog modeling of audio effects. These networks process time-domain audio signals using a series of matrix multiplication and nonlinear activation functions to emulate the behavior of the target device accurately. To additionally model the effect of the knobs for an RNN-based model, existing approaches integrate control parameters by concatenating them channel-wisely with some intermediate representation of the input signal. While this method is parameter-efficient, there is room to further improve the quality of generated audio because the concatenation-based conditioning method has limited capacity in modulating signals. In this paper, we propose three novel conditioning mechanisms for RNNs, tailored for black-box virtual analog modeling. These advanced conditioning mechanisms modulate the model based on control parameters, yielding superior results to existing RNN- and CNN-based architectures across various evaluation metrics.

Download

Differentiable All-Pole Filters for Time-Varying Audio Systems

Chin-Yun Yu; Christopher Mitcheltree; Alistair Carson; Stefan Bilbao; Joshua Reiss; György Fazekas

DAFx-2024 - Guildford

Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers. However, their recursive structure impedes end-toend training of these systems using automatic differentiation. Although non-recursive filter approximations like frequency sampling and frame-based processing have been proposed and widely used in previous works, they cannot accurately reflect the gradient of the original system. We alleviate this difficulty by reexpressing a time-varying all-pole filter to backpropagate the gradients through itself, so the filter implementation is not bound to the technical limitations of automatic differentiation frameworks. This implementation can be employed within audio systems containing filters with poles for efficient gradient evaluation. We demonstrate its training efficiency and expressive capabilities for modelling real-world dynamic audio systems on a phaser, time-varying subtractive synthesiser, and feed-forward compressor. We make our code and audio samples available and provide the trained audio effect and synth models in a VST plugin1 .

Download

Sparse Decomposition, Clustering and Noise for Fire Texture Sound Re-Synthesis

Stefan Kersten; Hendrik Purwins

DAFx-2012 - York

In this paper we introduce a framework that represents environmental texture sounds as a linear superposition of independent foreground and background layers that roughly correspond to entities in the physical production of the sound. Sound samples are decomposed into a sparse representation with the matching pursuit algorithm and a dictionary of Daubechies wavelet atoms. An agglomerative clustering procedure groups atoms into short transient molecules. A foreground layer is generated by sampling these sound molecules from a distribution, whose parameters are estimated from the input sample. The residual signal is modelled by an LPC-based source-filter model, synthesizing the background sound layer. The capability of the system is demonstrated with a set of fire sounds.

Download

Vocal Tract Area Estimation by Gradient Descent

David Südholt; Mateo Cámara; Zhiyuan Xu; Joshua D. Reiss

DAFx-2023 - Copenhagen

Articulatory features can provide interpretable and flexible controls for the synthesis of human vocalizations by allowing the user to directly modify parameters like vocal strain or lip position. To make this manipulation through resynthesis possible, we need to estimate the features that result in a desired vocalization directly from audio recordings. In this work, we propose a white-box optimization technique for estimating glottal source parameters and vocal tract shapes from audio recordings of human vowels. The approach is based on inverse filtering and optimizing the frequency response of a waveguide model of the vocal tract with gradient descent, propagating error gradients through the mapping of articulatory features to the vocal tract area function. We apply this method to the task of matching the sound of the Pink Trombone, an interactive articulatory synthesizer, to a given vocalization. We find that our method accurately recovers control functions for audio generated by the Pink Trombone itself. We then compare our technique against evolutionary optimization algorithms and a neural network trained to predict control parameters from audio. A subjective evaluation finds that our approach outperforms these black-box optimization baselines on the task of reproducing human vocalizations.

Download

Improving Singing Language Identification through i-Vector Extraction

Anna Kruspe

DAFx-2014 - Erlangen

Automatic language identification for singing is a topic that has not received much attention in the past years. Possible application scenarios include searching for musical pieces in a certain language, improvement of similarity search algorithms for music, and improvement of regional music classification and genre classification. It could also serve to mitigate the "glass ceiling" effect. Most existing approaches employ PPRLM processing (Parallel Phone Recognition followed by Language Modeling). We present a new approach for singing language identification. PLP, MFCC, and SDC features are extracted from audio files and then passed through an i-vector extractor. This algorithm reduces the training data for each sample to a single 450-dimensional feature vector. We then train Neural Networks and Support Vector Machines on these feature vectors. Due to the reduced data, the training process is very fast. The results are comparable to the state of the art, reaching accuracies of 83% on a large speech corpus and 78% on acapella singing. In contrast to PPRLM approaches, our algorithm does not require phoneme-wise annotations and is easier to implement.

Download

Adaptive Harmonization and Pitch Correction of Polyphonic Audio Using Spectral Clustering

Mathieu Lagrange; Graham Percival; George Tzanetakis

DAFx-2007 - Bordeaux

There are several well known harmonization and pitch correction techniques that can be applied to monophonic sound sources. They are based on automatic pitch detection and frequency shifting without time stretching. In many applications it is desired to apply such effects on the dominant melodic instrument of a polyphonic audio mixture. However, applying them directly to the mixture results in artifacts, and automatic pitch detection becomes unreliable. In this paper we describe how a dominant melody separation method based on spectral clustering of sinusoidal peaks can be used for adaptive harmonization and pitch correction in mono polyphonic audio mixtures. Motivating examples from a violin tutoring perspective as well as modifying the saxophone melody of an old jazz mono recording are presented.

Download

Making Sounds with Numbers: A Tutorial on Music Software Dedicated to Digital Audio

Nicola Bernardini; Davide Rocchesso

DAFx-1998 - Barcelona

A (partial) taxonomy of software applications devoted to sounds is presented. For each category of software applications, an abstract model is proposed and actual implementations are evaluated with respect to this model.

Download

Analysis-and-manipulation approach to pitch and duration of musical instrument sounds without distorting timbral characteristics

Takehiro Abe; Katsutoshi Itoyama; Kazuyoshi Yoshii; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

DAFx-2008 - Espoo

This paper presents an analysis-manipulation method that can generate musical instrument sounds with arbitrary pitches and durations from the sound of a given musical instrument (called seed) without distorting its timbral characteristics. Based on psychoacoustical knowledge of the auditory effects of timbres, we defined timbral features based on the spectrogram of the sound of a musical instrument as (i) the relative amplitudes of the harmonic peaks, (ii) the distribution of the inharmonic component, and (iii) temporal envelopes. First, to analyze the timbral features of a seed, it was separated into harmonic and inharmonic components using Itoyama’s integrated model. For pitch manipulation, we took into account the pitch-dependency of features (i) and (ii). We predicted the values of each feature by using a cubic polynomial that approximated the distribution of these features over pitches. To manipulate duration, we focused on preserving feature (iii) in the attack and decay duration of a seed. Therefore, only steady durations were expanded or shrunk. In addition, we propose a method for reproducing the properties of vibrato. Experimental results demonstrated the quality of the synthesized sounds produced using our method. The spectral and MFCC distances between the synthesized sounds and actual sounds of 32 instruments were reduced by 64.70% and 32.31%, respectively.

Download

Novel methods in Information Management for Advanced Audio Workflows

Gyorgy Fazekas; Mark Sandler

DAFx-2009 - Como

This paper discusses architectural aspects of a software library for unified metadata management in audio processing applications. The data incorporates editorial, production, acoustical and musicological features for a variety of use cases, ranging from adaptive audio effects to alternative metadata based visualisation. Our system is designed to capture information, prescribed by modular ontology schema. This advocates the development of intelligent user interfaces and advanced media workflows in music production environments. In an effort to reach these goals, we argue for the need of modularity and interoperable semantics in representing information. We discuss the advantages of extensible Semantic Web ontologies as opposed to using specialised but disharmonious metadata formats. Concepts and techniques permitting seamless integration with existing audio production software are described in detail.

Download

Interacting With Digital Audio Effects Through a Haptic Knob With Programmable Resistance

Yuri De Pra; Federico Fontana; Stefano Papetti

DAFx-2021 - Vienna (virtual)

Live music performances and music production often involve the manipulation of several parameters during sound generation, processing, and mixing. In hardware layouts, those parameters are usually controlled using knobs, sliders and buttons. When these layouts are virtualized, the use of physical (e.g. MIDI) controllers can make interaction easier and reduce the cognitive load associated to sound manipulation. The addition of haptic feedback can further improve such interaction by facilitating the detection of the nature (continuous / discrete) and value of a parameter. To this end, we have realized an endless-knob controller prototype with programmable resistance to rotation, able to render various haptic effects. Ten subjects assessed the effectiveness of the provided haptic feedback in a target-matching task where either visual-only or visual-haptic feedback was provided; the experiment reported significantly lower errors in presence of haptic feedback. Finally, the knob was configured as a multi-parametric controller for a real-time audio effect software written in Python, simulating the voltage-controlled filter aboard the EMS VCS3. The integration of the sound algorithm and the haptic knob is discussed, together with various haptic feedback effects in response to control actions.

Download

Proceedings of the International Conference on Digital Audio Effects (DAFx)

Proc. Int. Conf. Digital Audio Effects (DAFx)

Paper Archive

Years

Authors