Download Novel methods in Information Management for Advanced Audio Workflows This paper discusses architectural aspects of a software library for unified metadata management in audio processing applications. The data incorporates editorial, production, acoustical and musicological features for a variety of use cases, ranging from adaptive audio effects to alternative metadata based visualisation. Our system is designed to capture information, prescribed by modular ontology schema. This advocates the development of intelligent user interfaces and advanced media workflows in music production environments. In an effort to reach these goals, we argue for the need of modularity and interoperable semantics in representing information. We discuss the advantages of extensible Semantic Web ontologies as opposed to using specialised but disharmonious metadata formats. Concepts and techniques permitting seamless integration with existing audio production software are described in detail.
Download Quality Diversity for Synthesizer Sound Matching It is difficult to adjust the parameters of a complex synthesizer to
create the desired sound. As such, sound matching, the estimation of synthesis parameters that can replicate a certain sound, is
a task that has often been researched, utilizing optimization methods such as genetic algorithm (GA). In this paper, we introduce a
novelty-based objective for GA-based sound matching. Our contribution is two-fold. First, we show that the novelty objective is
able to improve the quality of sound matching by maintaining phenotypic diversity in the population. Second, we introduce a quality diversity approach to the problem of sound matching, aiming
to find a diverse set of matching sounds. We show that the novelty objective is effective in producing high-performing solutions
that are diverse in terms of specified audio features. This approach
allows for a new way of discovering sounds and exploring the capabilities of a synthesizer.
Download Event-Synchronous Music Synthesis This work presents a novel framework for music synthesis, based on the perceptual structure analysis of pre-existing musical signals, for example taken from a personal MP3 database. We raise the important issue of grounding music analysis on perception, and propose a bottom-up approach to music analysis, as well as modeling, and synthesis. A model of segmentation for polyphonic signals is described, and is qualitatively validated through several artifact-free music resynthesis experiments, e.g., reversing the ordering of sound events (notes), without reversing their waveforms. Then, a compact “timbre” structure analysis, and a method for song description in the form of an “audio DNA” sequence is presented. Finally, we propose novel applications, such as music cross-synthesis, or time-domain audio compression, enabled through simple sound similarity measures, and clustering.
Download Digital Audio Effects on Mobile Platforms This paper discusses the development of digital audio effect applications in mobile platforms. It introduces the Mobile Csound Platform (MCP) as an agile development kit for audio programming in such environments. The paper starts by exploring the basic technology employed: the Csound Application Programming Interface (API), the target systems (iOS and Android) and their support for realtime audio. CsoundObj, the fundamental class in the MCP toolkit is introduced and explored in some detail. This is followed by a discussion of its implementation in Objective-C for iOS and Java for Android. A number of application scenarios are explored and the paper concludes with a general discussion of the technology and its potential impact for audio effects development.
Download A Comparison of Player Performance in a Gamified Localisation Task Between Spatial Loudspeaker Systems This paper presents an experiment comparing player performance in a gamified localisation task between three loudspeaker configurations: stereo, 7.1 surround-sound and an equidistantly spaced octagonal array. The test was designed as a step towards determining whether spatialised game audio can improve player performance in a video game, thus influencing their overall experience. The game required players to find as many sound sources as possible, by using only sonic cues, in a 3D virtual game environment. Results suggest that the task was significantly easier when listening over a 7.1 surround-sound system, based on feedback from 24 participants. 7.1 was also the most preferred of the three listening conditions. The result was not entirely expected in that the octagonal array did not outperform 7.1. It is thought that, for the given stimuli, this may be a repercussion due to the octagonal array sacrificing an optimal front stereo pair, for more consistent imaging all around the listening space.
Download Frequency estimation of the first pinna notch in Head-Related Transfer Functions with a linear anthropometric model The relation between anthropometric parameters and Head-Related Transfer Function (HRTF) features, especially those due to the pinna, are not fully understood yet. In this paper we apply signal processing techniques to extract the frequencies of the main pinna notches (known as N1 , N2 , and N3 ) in the frontal part of the median plane and build a model relating them to 13 different anthropometric parameters of the pinna, some of which depend on the elevation angle of the sound source. Results show that while the considered anthropometric parameters are not able to approximate with sufficient accuracy neither the N2 nor the N3 frequency, eight of them are sufficient for modeling the frequency of N1 within a psychoacoustically acceptable margin of error. In particular, distances between the ear canal and the outer helix border are the most important parameters for predicting N1 .
Download j-DAFx - Digital Audio Effects in Java This paper describes an attempt to provide an online learning platform for digital audio effects. After a comprehensive study of different technologies presenting multimedia content dynamically reacting to user input, we decided to use Java Applets. Further investigations regard the implementation issues - especially the processing and visualization of audio data - and present a general framework used in our department. Recent and future digital effects implemented in this framework can be found on our web site.
Download Analysis-and-manipulation approach to pitch and duration of musical instrument sounds without distorting timbral characteristics This paper presents an analysis-manipulation method that can generate musical instrument sounds with arbitrary pitches and durations from the sound of a given musical instrument (called seed) without distorting its timbral characteristics. Based on psychoacoustical knowledge of the auditory effects of timbres, we defined timbral features based on the spectrogram of the sound of a musical instrument as (i) the relative amplitudes of the harmonic peaks, (ii) the distribution of the inharmonic component, and (iii) temporal envelopes. First, to analyze the timbral features of a seed, it was separated into harmonic and inharmonic components using Itoyama’s integrated model. For pitch manipulation, we took into account the pitch-dependency of features (i) and (ii). We predicted the values of each feature by using a cubic polynomial that approximated the distribution of these features over pitches. To manipulate duration, we focused on preserving feature (iii) in the attack and decay duration of a seed. Therefore, only steady durations were expanded or shrunk. In addition, we propose a method for reproducing the properties of vibrato. Experimental results demonstrated the quality of the synthesized sounds produced using our method. The spectral and MFCC distances between the synthesized sounds and actual sounds of 32 instruments were reduced by 64.70% and 32.31%, respectively.
Download Audio Content Transmission Content description has become a topic of interest for many researchers in the audiovisual field [1][2]. While manual annotation has been used for many years in different applications, the focus now is on finding automatic contentextraction and content-navigation tools. An increasing number of projects, in some of which we are actively involved, focus on the extraction of meaningful features from an audio signal. Meanwhile, standards like MPEG7 [3] are trying to find a convenient way of describing audiovisual content. Nevertheless, content description is usually thought of as an additional information stream attached to the ‘actual content’ and the only envisioned scenario is that of a search and retrieval framework. However, in this article it will be argued that if there is a suitable content description, the actual content itself may no longer be needed and we can concentrate on transmitting only its description. Thus, the receiver should be able to interpret the information that, in the form of metadata, is available at its inputs, and synthesize new content relying only on this description. It is possibly in the music field where this last step has been further developed, and that fact allows us to think of such a transmission scheme being available on the near future.
Download Performance-Driven Control for Sample-Based Singing Voice Synthesis In this paper we address the expressive control of singing voice synthesis. Singing Voice Synthesizers (SVS) traditionally require two types of inputs: a musical score and lyrics. The musical expression is then typically either generated automatically by applying a model of a certain type of expression to a high-level musical score, or achieved by manually editing low-level synthesizer parameters. We propose an alternative method, where the expression control is derived from a singing performance. In a first step, an analysis module extracts expressive information from the input voice signal, which is then adapted and mapped to the internal synthesizer controls. The presented implementation works in an off-line manner processing user input voice signals and lyrics using a phonetic segmentation module. The main contribution of this approach is to offer a direct way of controlling the expression of SVS. The further step is to run the system in real-time. The last section of this paper addresses a possible strategy for real-time operation.