Download Audio-Based Gesture Extraction on the ESITAR Controller
Using sensors to extract gestural information for control parameters of digital audio effects is common practice. There has also been research using machine learning techniques to classify specific gestures based on audio feature analysis. In this paper, we will describe our experiments in training a computer to map the appropriate audio-based features to look like sensor data, in order to potentially eliminate the need for sensors. Specifically, we will show our experiments using the ESitar, a digitally enhanced sensor based controller modeled after the traditional North Indian sitar. We utilize multivariate linear regression to map continuous audio features to continuous gestural data.
Download Musical Instrument Identification in Continuous Recordings
Recognition of musical instruments in multi-instrumental, polyphonic music is a difficult challenge which is yet far from being solved. Successful instrument recognition techniques in solos (monophonic or polyphonic recordings of single instruments) can help to deal with this task. We introduce an instrument recognition process in solo recordings of a set of instruments (bassoon, clarinet, flute, guitar, piano, cello and violin), which yields a high recognition rate. A large and very diverse solo database (108 different solos, all by different performers) is used in order to encompass the different sound possibilities of each instrument and evaluate the generalization abilities of the classification process. First we bring classification results using a very extensive collection of features (62 different feature types), and then use our GDE feature selection algorithm to select a smaller feature set with a relatively short computation time, which allows us to perform instrument recognition in solos in real-time, with only a slight decrease in recognition rate. We demonstrate that our real-time solo classifier can also be useful for instrument recognition in duet performances, and improved using simple “source reduction”.
Download The Sounding Gesture: An Overview
Sound control by gesture is a peculiar topic in Human-Computer Interaction: many different approaches to it are available, focusing each time on diversified perspectives. Our point of view is an interdisciplinary one: taking into account technical considerations about control theory and sound processing, we try to explore the expressiveness world which is closer to psychology theories. Starting from a state of the art which outlines two main approaches to the problem of ”making sound with gestures”, we will delve into psychological theories about expressiveness, describing in particular possible applications dealing with intermodality and mixed reality environments related to the Gestalt Theory. HCI design can indeed benefit from this kind of approach because of the quantitative methods that can be applied to measure expressiveness. Interfaces can be used in order to convey expressiveness, which is a plus of information that can help interacting with the machine; this kind of information can be coded as spatio-temporal schemes, as it is stated in Gestalt theory.
Download Multimodal Interfaces for Expressive Sound Control
This paper introduces research issues on multimodal interaction and interfaces for expressive sound control. We introduce Multisensory Integrated Expressive Environments (MIEEs) as a framework for Mixed Reality applications in the performing arts. Paradigmatic contexts for applications of MIEEs are multimedia concerts, interactive dance / music / video installations, interactive museum exhibitions, distributed cooperative environments for theatre and artistic expression. MIEEs are user-centred systems able to interpret the high-level information conveyed by performers through their expressive gestures and to establish an effective multisensory experience taking into account expressive, emotional, affective content. The lecture discusses some main issues for MIEEs and presents the EyesWeb (www.eyesweb.org) open software platform which has been recently redesigned (version 4) in order to better address MIEE requirements. Short live demonstrations are also presented.
Download Semi-automatic Ambience Generation
Ambiances are background recordings used in audiovisual productions to make listeners feel they are in places like a pub or a farm. Accessing to commercially available atmosphere libraries is a convenient alternative to sending teams to record ambiances yet they limit the creation in different ways. First, they are already mixed, which reduces the flexibility to add, remove individual sounds or change its panning. Secondly, the number of ambient libraries is limited. We propose a semi-automatic system for ambiance generation. The system creates ambiances on demand given text queries by fetching relevant sounds from a large sound effect database and importing them into a sequencer multitrack project. Ambiances of diverse nature can be created easily. Several controls are provided to the users to refine the type of samples and the sound arrangement.
Download Event-Synchronous Music Synthesis
This work presents a novel framework for music synthesis, based on the perceptual structure analysis of pre-existing musical signals, for example taken from a personal MP3 database. We raise the important issue of grounding music analysis on perception, and propose a bottom-up approach to music analysis, as well as modeling, and synthesis. A model of segmentation for polyphonic signals is described, and is qualitatively validated through several artifact-free music resynthesis experiments, e.g., reversing the ordering of sound events (notes), without reversing their waveforms. Then, a compact “timbre” structure analysis, and a method for song description in the form of an “audio DNA” sequence is presented. Finally, we propose novel applications, such as music cross-synthesis, or time-domain audio compression, enabled through simple sound similarity measures, and clustering.
Download On Finding Melodic Lines in Audio Recordings
The paper presents our approach to the problem of finding melodic line(s) in polyphonic audio recordings. The approach is composed of two different stages, partially rooted in psychoacoustic theories of music perception: the first stage is dedicated to finding regions with strong and stable pitch (melodic fragments), while in the second stage, these fragments are grouped according to their properties (pitch, loudness...) into clusters which represent melodic lines of the piece. Expectation Maximization algorithm is used in both stages to find the dominant pitch in a region, and to train Gaussian Mixture Models that group fragments into melodies. The paper presents the entire process in more detail and provides some initial results.
Download Sound Texture Modeling and Time-Frequency LPC
This paper presents a method to model and synthesize the textures of sounds such as fire, footsteps and typewriters using time and frequency domain linear prediction coding (TFLPC). The common character of this class of sounds is that they have a background “din” and a foreground transient sequence. By using LPC filters in both the time and frequency domain and a statistical representation of the transient sequence, the perceptual quality of the sound textures can be largely preserved, and the model used to manipulate and extend the sounds.
Download Spatial Auditory Displays - A study on the use of virtual audio environments as interfaces for users with visual disabilities
This paper presents the work on a prototype spatial auditory display. Using high-definition audio rendering a sample application was presented to a mixed group of users with visual disabilities and normal sighted users. The evaluation of the prototype provided insights into how effective spatial presentation of sound can be in terms of human-computer interaction (HCI). It showed that typical applications with the most common interaction tasks like menus, text input and dialogs can be presented very effectively using spatial audio. It also revealed that there is no significant difference in effectiveness between normal sighted and visually impaired users. We believe that spatial auditory displays are capable to provide the visually impaired and blind access to modern information technologies in a more efficient way than common technologies and that they will be inevitable for multimodal displays in future applications.
Download Audio Processing Using Haskell
The software for most today’s applications including signal processing applications is written in imperative languages. Imperative programs are fast because they are designed close to the architecture of the widespread computers, but they don’t match the structure of signal processing very well. In contrast to that, functional programming and especially lazy evaluation perfectly models many common operations on signals. Haskell is a statically typed, lazy functional programming language which allow for a very elegant and concise programming style. We want to sketch how to process signals, how to improve safety by the use of physical units, and how to compose music using this language.