Download A system for data-driven concatenative sound synthesis
In speech synthesis, concatenative data-driven synthesis methods prevail. They use a database of recorded speech and a unit selection algorithm that selects the segments that match best the utterance to be synthesized. Transferring these ideas to musical sound synthesis allows a new method of high quality sound synthesis. Usual synthesis methods are based on a model of the sound signal. It is very difficult to build a model that would preserve the entire fine details of sound. Concatenative synthesis achieves this by using actual recordings. This data-driven approach (as opposed to a rule-based approach) takes advantage of the information contained in the many sound recordings. For example, very naturally sounding transitions can be synthesized, since unit selection is aware of the context of the database units. The C ATERPILLAR software system has been developed to allow data-driven concatenative unit selection sound synthesis. It allows high-quality instrument synthesis with high level control, explorative free synthesis from arbitrary sound databases, or resynthesis of a recording with sounds from the database. It is based on the new software-engineering concept of component-oriented software, increasing flexibility and facilitating reuse.
Download The caterpillar system for data-driven concatenative sound synthesis
Concatenative data-driven synthesis methods are gaining more interest for musical sound synthesis and effects. They are based on a large database of sounds and a unit selection algorithm which finds the units that match best a given sequence of target units. We describe related work and our C ATERPILLAR synthesis system, focusing on recent new developments: the advantages of the addition of a relational SQL database, work on segmentation by alignment, the reformulation and extension of the unit selection algorithm using a constraint resolution approach, and new applications for musical and speech synthesis.
Download GABOR, multi-representation real-time analysis/synthesis
This article describes a set of modules for Max/MSP for real-time sound analysis and synthesis combining various models, representations and timing paradigms. Gabor provides a unified framework for granular synthesis, PSOLA, phase vocoder, additive synthesis and other STFT techniques. Gabor’s processing scheme allows for the treatment of atomic sound particles at arbitrary rates and instants. Gabor is based on FTM, an extension of Max/MSP, introducing complex data structures such as matrices and sequences to the Max data flow programming paradigm. Most of the signal processing operators of the Gabor modules handle vector and matrix representations closely related to SDIF sound description formats.
Download X-Micks - Interactive Real-Time Content Based Audio Processing
In this article we present the real-time audio plug-in X-Micks, an audio processing application allowing for remixing and hybridization of two beat-synchronized audio streams, which provides user interaction based on the extraction and visual rendering of information from the two real-time audio streams. In the current version, the plug-in uses the beat grid information provided by the plug-in host and a real-time estimation of energy in chosen frequency bands to construct an interactive matrix representation allowing for intuitive and efficient user interaction based on familiar representations such as the sonogram and the step sequencer. After trying to formulate the constitutional qualities of a rising new generation of audio processing tools of which we claim X-Micks being an exemplary specimen, the article gives an overview over the application’s interface, functionalities and implementaion.
Download Real-Time Corpus-Based Concatenative Synthesis with CataRT
The concatenative real-time sound synthesis system CataRT plays grains from a large corpus of segmented and descriptor-analysed sounds according to proximity to a target position in the descriptor space. This can be seen as a content-based extension to granular synthesis providing direct access to specific sound characteristics. CataRT is implemented in Max/MSP using the FTM library and an SQL database. Segmentation and MPEG-7 descriptors are loaded from SDIF files or generated on-the-fly. CataRT allows to explore the corpus interactively or via a target sequencer, to resynthesise an audio file or live input with the source sounds, or to experiment with expressive speech synthesis and gestural control.
Download State of the Art in Sound Texture Synthesis
The synthesis of sound textures, such as rain, wind, or crowds, is an important application for cinema, multimedia creation, games and installations. However, despite the clearly defined requirments of naturalness and flexibility, no automatic method has yet found widespread use. After clarifying the definition, terminology, and usages of sound texture synthesis, we will give an overview of the many existing methods and approaches, and the few available software implementations, and classify them by the synthesis model they are based on, such as subtractive or additive synthesis, granular synthesis, corpus-based concatenative synthesis, wavelets, or physical modeling. Additionally, an overview is given over analysis methods used for sound texture synthesis, such as segmentation, statistical modeling, timbral analysis, and modeling of transitions. 2
Download Vivos Voco: A survey of recent research on voice transformations at IRCAM
IRCAM has a long experience in analysis, synthesis and transformation of voice. Natural voice transformations are of great interest for many applications and can be combine with text-to-speech system, leading to a powerful creation tool. We present research conducted at IRCAM on voice transformations for the last few years. Transformations can be achieved in a global way by modifying pitch, spectral envelope, durations etc. While it sacrifices the possibility to attain a specific target voice, the approach allows the production of new voices of a high degree of naturalness with different gender and age, modified vocal quality, or another speech style. These transformations can be applied in realtime using ircamTools TR A X.Transformation can also be done in a more specific way in order to transform a voice towards the voice of a target speaker. Finally, we present some recent research on the transformation of expressivity.
Download Interaction-optimized Sound Database Representation
Interactive navigation within geometric, feature-based database representations allows expressive musical performances and installations. Once mapped to the feature space, the user’s position in a physical interaction setup (e.g. a multitouch tablet) can be used to select elements or trigger audio events. Hence physical displacements are directly connected to the evolution of sonic characteristics — a property we call analytic sound–control correspondence. However, automatically computed representations have a complex geometry which is unlikely to fit the interaction setup optimally. After a review of related work, we present a physical model-based algorithm that redistributes the representation within a user-defined region according to a user-defined density. The algorithm is designed to preserve the analytic sound-control correspondence property as much as possible, and uses a physical analogy between the triangulated database representation and a truss structure. After preliminary pre-uniformisation steps, internal repulsive forces help to spread points across the whole region until a target density is reached. We measure the algorithm performance relative to its ability to produce representations corresponding to user-specified features and to preserve analytic sound–control correspondence during a standard density-uniformisation task. Quantitative measures and visual evaluation outline the excellent performances of the algorithm, as well as the interest of the pre-uniformisation steps.
Download Concatenative Sound Texture Synthesis Methods and Evaluation
Concatenative synthesis is a practical approach to sound texture synthesis because of its nature in keeping realistic short-time signal characteristics. In this article, we investigate three concatenative synthesis methods for sound textures: concatenative synthesis with descriptor controls (CSDC), Montage synthesis (MS) and a new method called AudioTexture (AT). The respective algorithms are presented, focusing on the identification and selection of concatenation units. The evaluation demonstrates that the presented algorithms are of close performance in terms of quality and similarity compared to the reference original sounds.