Download Quality Diversity for Synthesizer Sound Matching
It is difficult to adjust the parameters of a complex synthesizer to create the desired sound. As such, sound matching, the estimation of synthesis parameters that can replicate a certain sound, is a task that has often been researched, utilizing optimization methods such as genetic algorithm (GA). In this paper, we introduce a novelty-based objective for GA-based sound matching. Our contribution is two-fold. First, we show that the novelty objective is able to improve the quality of sound matching by maintaining phenotypic diversity in the population. Second, we introduce a quality diversity approach to the problem of sound matching, aiming to find a diverse set of matching sounds. We show that the novelty objective is effective in producing high-performing solutions that are diverse in terms of specified audio features. This approach allows for a new way of discovering sounds and exploring the capabilities of a synthesizer.
Download Event-Synchronous Music Synthesis
This work presents a novel framework for music synthesis, based on the perceptual structure analysis of pre-existing musical signals, for example taken from a personal MP3 database. We raise the important issue of grounding music analysis on perception, and propose a bottom-up approach to music analysis, as well as modeling, and synthesis. A model of segmentation for polyphonic signals is described, and is qualitatively validated through several artifact-free music resynthesis experiments, e.g., reversing the ordering of sound events (notes), without reversing their waveforms. Then, a compact “timbre” structure analysis, and a method for song description in the form of an “audio DNA” sequence is presented. Finally, we propose novel applications, such as music cross-synthesis, or time-domain audio compression, enabled through simple sound similarity measures, and clustering.
Download Frequency estimation of the first pinna notch in Head-Related Transfer Functions with a linear anthropometric model
The relation between anthropometric parameters and Head-Related Transfer Function (HRTF) features, especially those due to the pinna, are not fully understood yet. In this paper we apply signal processing techniques to extract the frequencies of the main pinna notches (known as N1 , N2 , and N3 ) in the frontal part of the median plane and build a model relating them to 13 different anthropometric parameters of the pinna, some of which depend on the elevation angle of the sound source. Results show that while the considered anthropometric parameters are not able to approximate with sufficient accuracy neither the N2 nor the N3 frequency, eight of them are sufficient for modeling the frequency of N1 within a psychoacoustically acceptable margin of error. In particular, distances between the ear canal and the outer helix border are the most important parameters for predicting N1 .
Download Digital Audio Effects on Mobile Platforms
This paper discusses the development of digital audio effect applications in mobile platforms. It introduces the Mobile Csound Platform (MCP) as an agile development kit for audio programming in such environments. The paper starts by exploring the basic technology employed: the Csound Application Programming Interface (API), the target systems (iOS and Android) and their support for realtime audio. CsoundObj, the fundamental class in the MCP toolkit is introduced and explored in some detail. This is followed by a discussion of its implementation in Objective-C for iOS and Java for Android. A number of application scenarios are explored and the paper concludes with a general discussion of the technology and its potential impact for audio effects development.
Download A Comparison of Player Performance in a Gamified Localisation Task Between Spatial Loudspeaker Systems
This paper presents an experiment comparing player performance in a gamified localisation task between three loudspeaker configurations: stereo, 7.1 surround-sound and an equidistantly spaced octagonal array. The test was designed as a step towards determining whether spatialised game audio can improve player performance in a video game, thus influencing their overall experience. The game required players to find as many sound sources as possible, by using only sonic cues, in a 3D virtual game environment. Results suggest that the task was significantly easier when listening over a 7.1 surround-sound system, based on feedback from 24 participants. 7.1 was also the most preferred of the three listening conditions. The result was not entirely expected in that the octagonal array did not outperform 7.1. It is thought that, for the given stimuli, this may be a repercussion due to the octagonal array sacrificing an optimal front stereo pair, for more consistent imaging all around the listening space.
Download j-DAFx - Digital Audio Effects in Java
This paper describes an attempt to provide an online learning platform for digital audio effects. After a comprehensive study of different technologies presenting multimedia content dynamically reacting to user input, we decided to use Java Applets. Further investigations regard the implementation issues - especially the processing and visualization of audio data - and present a general framework used in our department. Recent and future digital effects implemented in this framework can be found on our web site.
Download Audio Content Transmission
Content description has become a topic of interest for many researchers in the audiovisual field [1][2]. While manual annotation has been used for many years in different applications, the focus now is on finding automatic contentextraction and content-navigation tools. An increasing number of projects, in some of which we are actively involved, focus on the extraction of meaningful features from an audio signal. Meanwhile, standards like MPEG7 [3] are trying to find a convenient way of describing audiovisual content. Nevertheless, content description is usually thought of as an additional information stream attached to the ‘actual content’ and the only envisioned scenario is that of a search and retrieval framework. However, in this article it will be argued that if there is a suitable content description, the actual content itself may no longer be needed and we can concentrate on transmitting only its description. Thus, the receiver should be able to interpret the information that, in the form of metadata, is available at its inputs, and synthesize new content relying only on this description. It is possibly in the music field where this last step has been further developed, and that fact allows us to think of such a transmission scheme being available on the near future.
Download The REACTION System: Automatic Sound Segmentation and Word Spotting for Verbal Reaction Time Tests
Reaction tests are typical tests from the field of psychological research and communication science in which a test person is presented some stimulus like a photo, a sound, or written words. The individual has to evaluate the stimulus as fast as possible in a predefined manner and has to react by presenting the result of the evaluation. This could be by pushing a button in simple reaction tests or by saying an answer in verbal reaction tests. The reaction time between the onset of the stimulus and the onset of the response can be used as a degree of difficulty for performing the given evaluation. Compared to simple reaction tests verbal reaction tests are very powerful since the individual can simply say the answer which is the most natural way of answering. The drawback for verbal reaction tests is that today the reaction times still have to be determined manually. This means that a person has to listen through all audio recordings taken during test sessions and mark stimuli times and word beginnings one by one which is very time consuming and people-intensive. To replace the manual evaluation of reaction tests this article presents the REACTION (Reaction Time Determination) system which can automatically determine the reaction times of a test session by analyzing the audio recording of the session. The system automatically detects the onsets of stimuli as well as the onsets of answers. The recording is furthermore segmented into parts each containing one stimulus and the following reaction which further facilitates the transcription of the spoken words for a semantic evaluation.
Download GMM supervector for Content Based Music Similarity
Timbral modeling is fundamental in content based music similarity systems. It is usually achieved by modeling the short term features by a Gaussian Model (GM) or Gaussian Mixture Models (GMM). In this article we propose to achieve this goal by using the GMM-supervector approach. This method allows to represent complex statistical models by an Euclidean vector. Experiments performed for the music similarity task showed that this model outperform state of the art approches. Moreover, it reduces the similarity search time by a factor of ≈ 100 compared to state of the art GM modeling. Furthermore, we propose a new supervector normalization which makes the GMM-supervector approach more preformant for the music similarity task. The proposed normalization can be applied to other Euclidean models.
Download Performance-Driven Control for Sample-Based Singing Voice Synthesis
In this paper we address the expressive control of singing voice synthesis. Singing Voice Synthesizers (SVS) traditionally require two types of inputs: a musical score and lyrics. The musical expression is then typically either generated automatically by applying a model of a certain type of expression to a high-level musical score, or achieved by manually editing low-level synthesizer parameters. We propose an alternative method, where the expression control is derived from a singing performance. In a first step, an analysis module extracts expressive information from the input voice signal, which is then adapted and mapped to the internal synthesizer controls. The presented implementation works in an off-line manner processing user input voice signals and lyrics using a phonetic segmentation module. The main contribution of this approach is to offer a direct way of controlling the expression of SVS. The further step is to run the system in real-time. The last section of this paper addresses a possible strategy for real-time operation.