DAFx Paper Archive - Search for generative machine learning, page 23 of 31

The REACTION System: Automatic Sound Segmentation and Word Spotting for Verbal Reaction Time Tests

DAFx-2007 - Bordeaux

Reaction tests are typical tests from the field of psychological research and communication science in which a test person is presented some stimulus like a photo, a sound, or written words. The individual has to evaluate the stimulus as fast as possible in a predefined manner and has to react by presenting the result of the evaluation. This could be by pushing a button in simple reaction tests or by saying an answer in verbal reaction tests. The reaction time between the onset of the stimulus and the onset of the response can be used as a degree of difficulty for performing the given evaluation. Compared to simple reaction tests verbal reaction tests are very powerful since the individual can simply say the answer which is the most natural way of answering. The drawback for verbal reaction tests is that today the reaction times still have to be determined manually. This means that a person has to listen through all audio recordings taken during test sessions and mark stimuli times and word beginnings one by one which is very time consuming and people-intensive. To replace the manual evaluation of reaction tests this article presents the REACTION (Reaction Time Determination) system which can automatically determine the reaction times of a test session by analyzing the audio recording of the session. The system automatically detects the onsets of stimuli as well as the onsets of answers. The recording is furthermore segmented into parts each containing one stimulus and the following reaction which further facilitates the transcription of the spoken words for a semantic evaluation.

Download

High-level musical control paradigms for Digital Signal Processing

Stroppa M.

DAFx-2000 - Verona

No matter how complex DSP algorithms are and how rich sonic processes they produce, the issue of their control immediately arises when they are used by musicians, independently on their knowledge of the underlying mathematics or their degree of familiarity with the design of digital instruments. This text will analyze the problem of the control of DSP modules from a compositional standpoint. An implementation of some paradigms in a Lisp-based environment (omChroma) will also be concisely discussed.

Download

GMM supervector for Content Based Music Similarity

Christophe Charbuillet; Damien Tardieu; Geoffroy Peeters

DAFx-2011 - Paris

Timbral modeling is fundamental in content based music similarity systems. It is usually achieved by modeling the short term features by a Gaussian Model (GM) or Gaussian Mixture Models (GMM). In this article we propose to achieve this goal by using the GMM-supervector approach. This method allows to represent complex statistical models by an Euclidean vector. Experiments performed for the music similarity task showed that this model outperform state of the art approches. Moreover, it reduces the similarity search time by a factor of ≈ 100 compared to state of the art GM modeling. Furthermore, we propose a new supervector normalization which makes the GMM-supervector approach more preformant for the music similarity task. The proposed normalization can be applied to other Euclidean models.

Download

Bio-Inspired Optimization of Parametric Onset Detectors

Domenico Stefani; Luca Turchet

DAFx-2021 - Vienna (virtual)

Onset detectors are used to recognize the beginning of musical events in audio signals. Manual parameter tuning for onset detectors is a time consuming task, while existing automated approaches often maximize only a single performance metric. These automated approaches cannot be used to optimize detector algorithms for complex scenarios, such as real-time onset detection where an optimization process must consider both detection accuracy and latency. For this reason, a flexible optimization algorithm should account for more than one performance metric in a multiobjective manner. This paper presents a generalized procedure for automated optimization of parametric onset detectors. Our procedure employs a bio-inspired evolutionary computation algorithm to replace manual parameter tuning, followed by the computation of the Pareto frontier for multi-objective optimization. The proposed approach was evaluated on all the onset detection methods of the Aubio library, using a dataset of monophonic acoustic guitar recordings. Results show that the proposed solution is effective in reducing the human effort required in the optimization process: it replaced more than two days of manual parameter tuning with 13 hours and 34 minutes of automated computation. Moreover, the resulting performance was comparable to that obtained by manual optimization.

Download

A Model for Adaptive Reduced-Dimensionality Equalisation

Spyridon Stasis; Ryan Stables; Jason Hockman

DAFx-2015 - Trondheim

We present a method for mapping between the input space of a parametric equaliser and a lower-dimensional representation, whilst preserving the effect’s dependency on the incoming audio signal. The model consists of a parameter weighting stage in which the parameters are scaled to spectral features of the audio signal, followed by a mapping process, in which the equaliser’s 13 inputs are converted to (x, y) coordinates. The model is trained with parameter space data representing two timbral adjectives (warm and bright), measured across a range of musical instrument samples, allowing users to impose a semantically-meaningful timbral modification using the lower-dimensional interface. We test 10 mapping techniques, comprising of dimensionality reduction and reconstruction methods, and show that a stacked autoencoder algorithm exhibits the lowest parameter reconstruction variance, thus providing an accurate map between the input and output space. We demonstrate that the model provides an intuitive method for controlling the audio effect’s parameter space, whilst accurately reconstructing the trajectories of each parameter and adapting to the incoming audio spectrum.

Download

Rumbator: a Flamenco Rumba Cover Version Generator Based on Audio Processing at Note Level

Carles Roig; Isabel Barbancho; Emilio Molina; Lorenzo J. Tardón; Ana María Barbancho

DAFx-2013 - Maynooth

In this article, a scheme to automatically generate polyphonic flamenco rumba versions from monophonic melodies is presented. Firstly, we provide an analysis about the parameters that defines the flamenco rumba, and then, we propose a method for transforming a generic monophonic audio signal into such a style. Our method firstly transcribes the monophonic audio signal into a symbolic representation, and then a set of note-level audio transformations based on music theory is applied to the monophonic audio signal in order to transform it to the polyphonic flamenco rumba style. Some audio examples of this transformation software are also provided.

Download

A Direct Microdynamics Adjusting Processor with Matching Paradigm and Differentiable Implementation

Shahan Nercessian; Russell McClellan; Alexey Lukin

DAFx-2022 - Vienna

In this paper, we propose a new processor capable of directly changing the microdynamics of an audio signal primarily via a single dedicated user-facing parameter. The novelty of our processor is that it has built into it a measure of relative level, a short-term signal strength measurement which is robust to changes in signal macrodynamics. Consequent dynamic range processing is signal level-independent in its nature, and attempts to directly alter its observed relative level measurements. The inclusion of such a meter within our proposed processor also gives rise to a natural solution to the dynamics matching problem, where we attempt to transfer the microdynamic characteristics of one audio recording to another by means of estimating appropriate settings for the processor. We suggest a means of providing a reasonable initial guess for processor settings, followed by an efficient iterative algorithm to refine upon our estimates. Additionally, we implement the processor as a differentiable recurrent layer and show its effectiveness when wrapped around a gradient descent optimizer within a deep learning framework. Moreover, we illustrate that the proposed processor has more favorable gradient characteristics relative to a conventional dynamic range compressor. Throughout, we consider extensions of the processor, matching algorithm, and differentiable implementation for the multiband case.

Download

Controlling a Non Linear Friction Model for Evocative Sound Synthesis Applications

Etienne Thoret; Mitsuko Aramaki; Charles Gondre; Richard Kronland-Martinet; Sølvi Ystad

DAFx-2013 - Maynooth

In this paper, a flexible strategy to control a synthesis model of sounds produced by non linear friction phenomena is proposed for guidance or musical purposes. It enables to synthesize different types of sounds, such a creaky door, a singing glass or a squeaking wet plate. This approach is based on the action/object paradigm that enables to propose a synthesis strategy using classical linear filtering techniques (source/resonance approach) which provide an efficient implementation. Within this paradigm, a sound can be considered as the result of an action (e.g. impacting, rubbing, ...) on an object (plate, bowl, ...). However, in the case of non linear friction phenomena, simulating the physical coupling between the action and the object with a completely decoupled source/resonance model is a real and relevant challenge. To meet this challenge, we propose to use a synthesis model of the source that is tuned on recorded sounds according to physical and spectral observations. This model enables to synthesize many types of non linear behaviors. A control strategy of the model is then proposed by defining a flexible physically informed mapping between a descriptor, and the non linear synthesis behavior. Finally, potential applications to the remediation of motor diseases are presented. In all sections, video and audio materials are available at the following URL: http://www.lma.cnrs-mrs.fr/~kronland/ thoretDAFx2013/

Download

Composing Musical Spaces By Means of Decorrelation of Audio Signals

Vaggione H.

DAFx-2001 - Limerick

Download

Blind Arbitrary Reverb Matching

Andy Sarroff; Roth Michaels

DAFx-2020 - Vienna (virtual)

Reverb provides psychoacoustic cues that convey information concerning relative locations within an acoustical space. The need arises often in audio production to impart an acoustic context on an audio track that resembles a reference track. One tool for making audio tracks appear to be recorded in the same space is by applying reverb to a dry track that is similar to the reverb in a wet one. This paper presents a model for the task of “reverb matching,” where we attempt to automatically add artificial reverb to a track, making it sound like it was recorded in the same space as a reference track. We propose a model architecture for performing reverb matching and provide subjective experimental results suggesting that the reverb matching model can perform as well as a human. We also provide open source software for generating training data using an arbitrary Virtual Studio Technology plug-in.

Download

Proceedings of the International Conference on Digital Audio Effects (DAFx)

Proc. Int. Conf. Digital Audio Effects (DAFx)

Paper Archive

Years

Authors