DAFx Paper Archive - Search for 2021, page 5 of 21

Modal Spring Reverb Based on Discretisation of the Thin Helical Spring Model

DAFx-2021 - Vienna (virtual)

The distributed nature of coupling in helical springs presents specific challenges in obtaining efficient computational structures for accurate spring reverb simulation. For direct simulation approaches, such as finite-difference methods, this is typically manifested in significant numerical dispersion within the hearing range. Building on a recent study of a simpler spring model, this paper presents an alternative discretisation approach that employs higher-order spatial approximations and applies centred stencils at the boundaries to address the underlying linear-system eigenvalue problem. Temporal discretisation is then applied to the resultant uncoupled mode system, rendering an efficient and flexible modal reverb structure. Through dispersion analysis it is shown that numerical dispersion errors can be kept extremely small across the hearing range for a relatively low number of system nodes. Analysis of an impulse response simulated using model parameters calculated from a measured spring geometry confirms that the model captures an enhanced set of spring characteristics.

Download

Transition-Aware: A More Robust Approach for Piano Transcription

Xianke Wang; Wei Xu; Juanting Liu; Weiming Yang; Wenqing Cheng

DAFx-2021 - Vienna (virtual)

Piano transcription is a classic problem in music information retrieval. More and more transcription methods based on deep learning have been proposed in recent years. In 2019, Google Brain published a larger piano transcription dataset, MAESTRO. On this dataset, Onsets and Frames transcription approach proposed by Hawthorne achieved a stunning onset F1 score of 94.73%. Unlike the annotation method of Onsets and Frames, Transition-aware model presented in this paper annotates the attack process of piano signals called atack transition in multiple frames, instead of only marking the onset frame. In this way, the piano signals around onset time are taken into account, enabling the detection of piano onset more stable and robust. Transition-aware achieves a higher transcription F1 score than Onsets and Frames on MAESTRO dataset and MAPS dataset, reducing many extra note detection errors. This indicates that Transition-aware approach has better generalization ability on different datasets.

Download

Interpretable timbre synthesis using variational autoencoders regularized on timbre descriptors

Anastasia Natsiou; Luca Longo; Sean O'Leary

DAFx-2023 - Copenhagen

Controllable timbre synthesis has been a subject of research for several decades, and deep neural networks have been the most successful in this area. Deep generative models such as Variational Autoencoders (VAEs) have the ability to generate a high-level representation of audio while providing a structured latent space. Despite their advantages, the interpretability of these latent spaces in terms of human perception is often limited. To address this limitation and enhance the control over timbre generation, we propose a regularized VAE-based latent space that incorporates timbre descriptors. Moreover, we suggest a more concise representation of sound by utilizing its harmonic content, in order to minimize the dimensionality of the latent space.

Download

On the Estimation of Sinusoidal Parameters via Parabolic Interpolation of Scaled Magnitude Spectra

Marcelo Caetano; Philippe Depalle

DAFx-2021 - Vienna (virtual)

Sinusoids are widely used to represent the oscillatory modes of music and speech. The estimation of the sinusoidal parameters directly affects the quality of the representation. A parabolic interpolation of the peaks of the log-magnitude spectrum is commonly used to get a more accurate estimation of the frequencies and the amplitudes of the sinusoids at a relatively low computational cost. Recently, Werner and Germain proposed an improved sinusoidal estimator that performs parabolic interpolation of the peaks of a power-scaled magnitude spectrum. For each analysis window type and size, a power-scaling factor p is pre-calculated via a computationally demanding heuristic. Consequently, the powerscaling estimation method is currently constrained to a few tabulated power-scaling factors for pre-selected window sizes, limiting its practical applications. In this article, we propose a method to obtain the power-scaling factor p for any window size from the tabulated values. Additionally, we investigate the impact of zeropadding on the estimation accuracy of the power-scaled sinusoidal parameter estimator.

Download

A Hierarchical Deep Learning Approach for Minority Instrument Detection

Dylan Sechet; Francesca Bugiotti; Matthieu Kowalski; Edouard D'Hérouville; Filip Langiewicz

DAFx-2024 - Guildford

Identifying instrument activities within audio excerpts is vital in music information retrieval, with significant implications for music cataloging and discovery. Prior deep learning endeavors in musical instrument recognition have predominantly emphasized instrument classes with ample data availability. Recent studies have demonstrated the applicability of hierarchical classification in detecting instrument activities in orchestral music, even with limited fine-grained annotations at the instrument level. Based on the Hornbostel-Sachs classification, such a hierarchical classification system is evaluated using the MedleyDB dataset, renowned for its diversity and richness concerning various instruments and music genres. This work presents various strategies to integrate hierarchical structures into models and tests a new class of models for hierarchical music prediction. This study showcases more reliable coarse-level instrument detection by bridging the gap between detailed instrument identification and group-level recognition, paving the way for further advancements in this domain.

Download

Vocal Tract Area Estimation by Gradient Descent

David Südholt; Mateo Cámara; Zhiyuan Xu; Joshua D. Reiss

DAFx-2023 - Copenhagen

Articulatory features can provide interpretable and flexible controls for the synthesis of human vocalizations by allowing the user to directly modify parameters like vocal strain or lip position. To make this manipulation through resynthesis possible, we need to estimate the features that result in a desired vocalization directly from audio recordings. In this work, we propose a white-box optimization technique for estimating glottal source parameters and vocal tract shapes from audio recordings of human vowels. The approach is based on inverse filtering and optimizing the frequency response of a waveguide model of the vocal tract with gradient descent, propagating error gradients through the mapping of articulatory features to the vocal tract area function. We apply this method to the task of matching the sound of the Pink Trombone, an interactive articulatory synthesizer, to a given vocalization. We find that our method accurately recovers control functions for audio generated by the Pink Trombone itself. We then compare our technique against evolutionary optimization algorithms and a neural network trained to predict control parameters from audio. A subjective evaluation finds that our approach outperforms these black-box optimization baselines on the task of reproducing human vocalizations.

Download

ICGAN: An Implicit Conditioning Method for Interpretable Feature Control of Neural Audio Synthesis

Yunyi Liu; Craig Jin

DAFx-2024 - Guildford

Neural audio synthesis methods can achieve high-fidelity and realistic sound generation by utilizing deep generative models. Such models typically rely on external labels which are often discrete as conditioning information to achieve guided sound generation. However, it remains difficult to control the subtle changes in sounds without appropriate and descriptive labels, especially given a limited dataset. This paper proposes an implicit conditioning method for neural audio synthesis using generative adversarial networks that allows for interpretable control of the acoustic features of synthesized sounds. Our technique creates a continuous conditioning space that enables timbre manipulation without relying on explicit labels. We further introduce an evaluation metric to explore controllability and demonstrate that our approach is effective in enabling a degree of controlled variation of different synthesized sound effects for in-domain and cross-domain sounds.

Download

Improving Synthesizer Programming From Variational Autoencoders Latent Space

Gwendal Le Vaillant; Thierry Dutoit; Sébastien Dekeyser

DAFx-2021 - Vienna (virtual)

Deep neural networks have been recently applied to the task of automatic synthesizer programming, i.e., finding optimal values of sound synthesis parameters in order to reproduce a given input sound. This paper focuses on generative models, which can infer parameters as well as generate new sets of parameters or perform smooth morphing effects between sounds. We introduce new models to ensure scalability and to increase performance by using heterogeneous representations of parameters as numerical and categorical random variables. Moreover, a spectral variational autoencoder architecture with multi-channel input is proposed in order to improve inference of parameters related to the pitch and intensity of input sounds. Model performance was evaluated according to several criteria such as parameters estimation error and audio reconstruction accuracy. Training and evaluation were performed using a 30k presets dataset which is published with this paper. They demonstrate significant improvements in terms of parameter inference and audio accuracy and show that presented models can be used with subsets or full sets of synthesizer parameters.

Download

Practical Virtual Analog Modeling Using Möbius Transforms

François G. Germain

DAFx-2021 - Vienna (virtual)

Möbius transforms provide for the definition of a family of onestep discretization methods offering a framework for alleviating well-known limitations of common one-step methods, such as the trapezoidal method, at no cost in model compactness or complexity. In this paper, we extend the existing theory around these methods. Here, we show how it can be applied to common frameworks used to structure virtual analog models. Then, we propose practical strategies to tune the transform parameters for best simulation results. Finally, we show how such strategies enable us to formulate much improved non-oversampled virtual analog models for several historical audio circuits.

Download

A Study of Control Methods for Percussive Sound Synthesis Based on Gans

António Ramires; Jordan Juras; Julian D. Parker; Xavier Serra

DAFx-2022 - Vienna

The process of creating drum sounds has seen significant evolution in the past decades. The development of analogue drum synthesizers, such as the TR-808, and modern sound design tools in Digital Audio Workstations led to a variety of drum timbres that defined entire musical genres. Recently, drum synthesis research has been revived with a new focus on training generative neural networks to create drum sounds. Different interfaces have previously been proposed to control the generative process, from low-level latent space navigation to high-level semantic feature parameterisation, but no comprehensive analysis has been presented to evaluate how each approach relates to the creative process. We aim to evaluate how different interfaces support creative control over drum generation by conducting a user study based on the Creative Support Index. We experiment with both a supervised method that decodes semantic latent space directions and an unsupervised Closed-Form Factorization approach from computer vision literature to parameterise the generation process and demonstrate that the latter is the preferred means to control a drum synthesizer based on the StyleGAN2 network architecture.

Download

Proceedings of the International Conference on Digital Audio Effects (DAFx)

Proc. Int. Conf. Digital Audio Effects (DAFx)

Paper Archive

Years

Authors