Download A New Paradigm for Sound Design A sound scene can be defined as any “environmental” sound that has a consistent background texture, with one or more potentially recurring foreground events. We describe a data-driven framework for analyzing, transforming, and synthesizing high-quality sound scenes, with flexible control over the components of the synthesized sound. Given one or more sound scenes, we provide well-defined means to: (1) identify points of interest in the sound and extract them into reusable templates, (2) transform sound components independently of the background or other events, (3) continually re-synthesize the background texture in a perceptually convincing manner, and (4) controllably place event templates over the background, varying key parameters such as density, periodicity, relative loudness, and spatial positioning. Contributions include: techniques and paradigms for template selection and extraction, independent sound transformation and flexible re-synthesis; extensions to a wavelet-based background analysis/synthesis; and user interfaces to facilitate the various phases. Given this framework, it is possible to completely transform an existing sound scene, dynamically generate sound scenes of unlimited length, and construct new sound scenes by combining elements from different sound scenes. URL: http://taps.cs.princeton.edu/
Download Performance-Driven Control for Sample-Based Singing Voice Synthesis In this paper we address the expressive control of singing voice synthesis. Singing Voice Synthesizers (SVS) traditionally require two types of inputs: a musical score and lyrics. The musical expression is then typically either generated automatically by applying a model of a certain type of expression to a high-level musical score, or achieved by manually editing low-level synthesizer parameters. We propose an alternative method, where the expression control is derived from a singing performance. In a first step, an analysis module extracts expressive information from the input voice signal, which is then adapted and mapped to the internal synthesizer controls. The presented implementation works in an off-line manner processing user input voice signals and lyrics using a phonetic segmentation module. The main contribution of this approach is to offer a direct way of controlling the expression of SVS. The further step is to run the system in real-time. The last section of this paper addresses a possible strategy for real-time operation.
Download Real-Time Detection of Finger Picking Musical Structures MIDIME is a software architecture that houses improvisational agents that react to MIDI messages from a finger-picked guitar. They operate in a pipeline whose first stage converts MIDI messages to a map of the state of instrument strings over time, and whose second stage selects rhythmic, modal, chordal, and melodic interpretations from the superposition of interpretations latent in the first stage. These interpretations are nondeterministic, not because of any arbitrary injection of randomness by an algorithm, but because guitar playing is nondeterministic. Variations in timing, tuning, picking intensity, string damping, and accidental or intensional grace notes can affect the selections of this second stage. The selections open to the second stage, as well as the third stage that matches second stage selections to a stored library of composition fragments, reflect the superposition of possible perceptions and interpretations of a piece of music. This paper concentrates on these working analytical stages of MIDIME. It also outlines plans for using the genetic algorithm to develop improvisational agents in the final pipeline stage.