Download The REACTION System: Automatic Sound Segmentation and Word Spotting for Verbal Reaction Time Tests Reaction tests are typical tests from the field of psychological research and communication science in which a test person is presented some stimulus like a photo, a sound, or written words. The individual has to evaluate the stimulus as fast as possible in a predefined manner and has to react by presenting the result of the evaluation. This could be by pushing a button in simple reaction tests or by saying an answer in verbal reaction tests. The reaction time between the onset of the stimulus and the onset of the response can be used as a degree of difficulty for performing the given evaluation. Compared to simple reaction tests verbal reaction tests are very powerful since the individual can simply say the answer which is the most natural way of answering. The drawback for verbal reaction tests is that today the reaction times still have to be determined manually. This means that a person has to listen through all audio recordings taken during test sessions and mark stimuli times and word beginnings one by one which is very time consuming and people-intensive. To replace the manual evaluation of reaction tests this article presents the REACTION (Reaction Time Determination) system which can automatically determine the reaction times of a test session by analyzing the audio recording of the session. The system automatically detects the onsets of stimuli as well as the onsets of answers. The recording is furthermore segmented into parts each containing one stimulus and the following reaction which further facilitates the transcription of the spoken words for a semantic evaluation.
Download High-level musical control paradigms for Digital Signal Processing No matter how complex DSP algorithms are and how rich sonic processes they produce, the issue of their control immediately arises when they are used by musicians, independently on their knowledge of the underlying mathematics or their degree of familiarity with the design of digital instruments. This text will analyze the problem of the control of DSP modules from a compositional standpoint. An implementation of some paradigms in a Lisp-based environment (omChroma) will also be concisely discussed.
Download GMM supervector for Content Based Music Similarity Timbral modeling is fundamental in content based music similarity systems. It is usually achieved by modeling the short term features by a Gaussian Model (GM) or Gaussian Mixture Models (GMM). In this article we propose to achieve this goal by using the GMM-supervector approach. This method allows to represent complex statistical models by an Euclidean vector. Experiments performed for the music similarity task showed that this model outperform state of the art approches. Moreover, it reduces the similarity search time by a factor of ≈ 100 compared to state of the art GM modeling. Furthermore, we propose a new supervector normalization which makes the GMM-supervector approach more preformant for the music similarity task. The proposed normalization can be applied to other Euclidean models.
Download Bio-Inspired Optimization of Parametric Onset Detectors Onset detectors are used to recognize the beginning of musical
events in audio signals. Manual parameter tuning for onset detectors is a time consuming task, while existing automated approaches often maximize only a single performance metric. These
automated approaches cannot be used to optimize detector algorithms for complex scenarios, such as real-time onset detection
where an optimization process must consider both detection accuracy and latency. For this reason, a flexible optimization algorithm
should account for more than one performance metric in a multiobjective manner. This paper presents a generalized procedure for
automated optimization of parametric onset detectors. Our procedure employs a bio-inspired evolutionary computation algorithm
to replace manual parameter tuning, followed by the computation
of the Pareto frontier for multi-objective optimization. The proposed approach was evaluated on all the onset detection methods
of the Aubio library, using a dataset of monophonic acoustic guitar
recordings. Results show that the proposed solution is effective in
reducing the human effort required in the optimization process: it
replaced more than two days of manual parameter tuning with 13
hours and 34 minutes of automated computation. Moreover, the
resulting performance was comparable to that obtained by manual
optimization.
Download A Model for Adaptive Reduced-Dimensionality Equalisation We present a method for mapping between the input space of a parametric equaliser and a lower-dimensional representation, whilst preserving the effect’s dependency on the incoming audio signal. The model consists of a parameter weighting stage in which the parameters are scaled to spectral features of the audio signal, followed by a mapping process, in which the equaliser’s 13 inputs are converted to (x, y) coordinates. The model is trained with parameter space data representing two timbral adjectives (warm and bright), measured across a range of musical instrument samples, allowing users to impose a semantically-meaningful timbral modification using the lower-dimensional interface. We test 10 mapping techniques, comprising of dimensionality reduction and reconstruction methods, and show that a stacked autoencoder algorithm exhibits the lowest parameter reconstruction variance, thus providing an accurate map between the input and output space. We demonstrate that the model provides an intuitive method for controlling the audio effect’s parameter space, whilst accurately reconstructing the trajectories of each parameter and adapting to the incoming audio spectrum.
Download Rumbator: a Flamenco Rumba Cover Version Generator Based on Audio Processing at Note Level In this article, a scheme to automatically generate polyphonic flamenco rumba versions from monophonic melodies is presented. Firstly, we provide an analysis about the parameters that defines the flamenco rumba, and then, we propose a method for transforming a generic monophonic audio signal into such a style. Our method firstly transcribes the monophonic audio signal into a symbolic representation, and then a set of note-level audio transformations based on music theory is applied to the monophonic audio signal in order to transform it to the polyphonic flamenco rumba style. Some audio examples of this transformation software are also provided.
Download A Direct Microdynamics Adjusting Processor with Matching Paradigm and Differentiable Implementation In this paper, we propose a new processor capable of directly changing the microdynamics of an audio signal primarily via a single dedicated user-facing parameter. The novelty of our processor is that it has built into it a measure of relative level, a short-term signal strength measurement which is robust to changes in signal macrodynamics. Consequent dynamic range processing is signal level-independent in its nature, and attempts to directly alter its observed relative level measurements. The inclusion of such a meter within our proposed processor also gives rise to a natural solution to the dynamics matching problem, where we attempt to transfer the microdynamic characteristics of one audio recording to another by means of estimating appropriate settings for the processor. We suggest a means of providing a reasonable initial guess for processor settings, followed by an efficient iterative algorithm to refine upon our estimates. Additionally, we implement the processor as a differentiable recurrent layer and show its effectiveness when wrapped around a gradient descent optimizer within a deep learning framework. Moreover, we illustrate that the proposed processor has more favorable gradient characteristics relative to a conventional dynamic range compressor. Throughout, we consider extensions of the processor, matching algorithm, and differentiable implementation for the multiband case.
Download Controlling a Non Linear Friction Model for Evocative Sound Synthesis Applications In this paper, a flexible strategy to control a synthesis model of sounds produced by non linear friction phenomena is proposed for guidance or musical purposes. It enables to synthesize different types of sounds, such a creaky door, a singing glass or a squeaking wet plate. This approach is based on the action/object paradigm that enables to propose a synthesis strategy using classical linear filtering techniques (source/resonance approach) which provide an efficient implementation. Within this paradigm, a sound can be considered as the result of an action (e.g. impacting, rubbing, ...) on an object (plate, bowl, ...). However, in the case of non linear friction phenomena, simulating the physical coupling between the action and the object with a completely decoupled source/resonance model is a real and relevant challenge. To meet this challenge, we propose to use a synthesis model of the source that is tuned on recorded sounds according to physical and spectral observations. This model enables to synthesize many types of non linear behaviors. A control strategy of the model is then proposed by defining a flexible physically informed mapping between a descriptor, and the non linear synthesis behavior. Finally, potential applications to the remediation of motor diseases are presented. In all sections, video and audio materials are available at the following URL: http://www.lma.cnrs-mrs.fr/~kronland/ thoretDAFx2013/
Download Blind Arbitrary Reverb Matching Reverb provides psychoacoustic cues that convey information concerning relative locations within an acoustical space. The need
arises often in audio production to impart an acoustic context on an
audio track that resembles a reference track. One tool for making
audio tracks appear to be recorded in the same space is by applying
reverb to a dry track that is similar to the reverb in a wet one. This
paper presents a model for the task of “reverb matching,” where
we attempt to automatically add artificial reverb to a track, making
it sound like it was recorded in the same space as a reference track.
We propose a model architecture for performing reverb matching
and provide subjective experimental results suggesting that the reverb matching model can perform as well as a human. We also
provide open source software for generating training data using an
arbitrary Virtual Studio Technology plug-in.