Download Feature Based Delay Line Using Real-Time Concatenative Synthesis
In this paper we introduce a novel approach utilizing real-time concatenative synthesis to produce a Feature-Based Delay Line (FBDL). Expanding upon the concept of a traditional delay, its most basic function is familiar – a dry signal is copied to an audio buffer whose read position is time shifted producing a delayed or "wet" signal that is then remixed with the dry. In our implementation, however, the traditionally unaltered wet signal is modified such that the audio delay buffer is segmented and concatenated according to specific audio features. Specifically, the input audio is analyzed and segmented as it is written to the delay buffer, where delayed segments are matched to a target feature set, such that the most similar segments are selected to constitute the wet signal of the delay. Targeting methods, either manual or automated, can be used to explore the feature space of the delay line buffer based on dry signal feature information and relevant targeting parameters, such as delay time. This paper will outline our process, detailing important requirements such as targeting and considerations for feature extraction and concatenation synthesis, as well as discussing use cases, performance evaluation, and commentary on the potential of advances to digital delay lines.
Download MOSIEVIUS: Feature driven interactive audio mosaicing
The process of creating an audio mosaic consists of the concatenation of segments of sound. Segments are chosen to correspond best with a description of a target sound specified by the desired features of the final mosaic. Current audio mosaicing techniques take advantage of the description of future target units in order to make more intelligent decisions when choosing individual segments. In this paper, we investigate ways to expand mosaicing techniques in order to use the mosaicing process as an interactive means of musical expression in real time. In our system, the user can interactively choose the specification of the target as well as the source signals from which the mosaic is composed. These means of control are incorporated into MoSievius, a framework intended for the rapid implementation of different interactive mosaicing techniques. Its integral means of control, the Sound Sieve, provides real-time control over the source selection process when creating an audio mosaic. We discuss a number of new real-time effects that can be achieved through use of the Sound Sieve.
Download Real-Time Corpus-Based Concatenative Synthesis with CataRT
The concatenative real-time sound synthesis system CataRT plays grains from a large corpus of segmented and descriptor-analysed sounds according to proximity to a target position in the descriptor space. This can be seen as a content-based extension to granular synthesis providing direct access to specific sound characteristics. CataRT is implemented in Max/MSP using the FTM library and an SQL database. Segmentation and MPEG-7 descriptors are loaded from SDIF files or generated on-the-fly. CataRT allows to explore the corpus interactively or via a target sequencer, to resynthesise an audio file or live input with the source sounds, or to experiment with expressive speech synthesis and gestural control.
Download Creating Endless Sounds
This paper proposes signal processing methods to extend a stationary part of an audio signal endlessly. A frequent occasion is that there is not enough audio material to build a synthesizer, but an example sound must be extended or modified for more variability. Filtering of a white noise signal with a filter designed based on high-order linear prediction or concatenation of the example signal can produce convincing arbitrarily long sounds, such as ambient noise or musical tones, and can be interpreted as a spectral freeze technique without looping. It is shown that the random input signal will pump energy to the narrow resonances of the filter so that lively and realistic variations in the sound are generated. For realtime implementation, this paper proposes to replace white noise with velvet noise, as this reduces the number of operations by 90% or more, with respect to standard convolution, without affecting the sound quality, or by FFT convolution, which can be simplified to the randomization of spectral phase and only taking the inverse FFT. Examples of producing endless airplane cabin noise and piano tones based on a short example recording are studied. The proposed methods lead to a new way to generate audio material for music, films, and gaming.
Download Combining Zeroth and First-Order Analysis With Lagrange Polynomials to Reduce Artefacts in Live Concatenative Granulation
This paper presents a technique addressing signal discontinuity and concatenation artefacts in real-time granular processing with rectangular windowing. By combining zero-crossing synchronicity, first-order derivative analysis, and Lagrange polynomials, we can generate streams of uncorrelated and non-overlapping sonic fragments with minimal low-order derivatives discontinuities. The resulting open-source algorithm, implemented in the Faust language, provides a versatile real-time software for dynamical looping, wavetable oscillation, and granulation with reduced artefacts due to rectangular windowing and no artefacts from overlap-add-to-one techniques commonly deployed in granular processing.
Download Morphing techniques for enhanced scat singing
In jazz, scat singing is a phonetic improvisation that imitates instrumental sounds. In this paper, we propose a system that aims to transform singing voice into real instrument sounds, extending the possibilities for scat singers. Analysis algorithms in the spectral domain extract voice parameters, which drive the resulting instrument sound. A small database contains real instrument samples that have been spectrally analyzed offline. Two different prototypes are introduced, producing sounds of a trumpet and a bass guitar respectively.
Download Event-Synchronous Music Synthesis
This work presents a novel framework for music synthesis, based on the perceptual structure analysis of pre-existing musical signals, for example taken from a personal MP3 database. We raise the important issue of grounding music analysis on perception, and propose a bottom-up approach to music analysis, as well as modeling, and synthesis. A model of segmentation for polyphonic signals is described, and is qualitatively validated through several artifact-free music resynthesis experiments, e.g., reversing the ordering of sound events (notes), without reversing their waveforms. Then, a compact “timbre” structure analysis, and a method for song description in the form of an “audio DNA” sequence is presented. Finally, we propose novel applications, such as music cross-synthesis, or time-domain audio compression, enabled through simple sound similarity measures, and clustering.
Download Modeling Harmonic Phases at Glottal Closure Instants
We propose a model that predicts harmonic phases at glottal closure instants. Phases are obtained from the scaled harmonic amplitude envelope derivative. This method is able to generate convincing synthesis results while avoids typical phasiness artifacts. A clear advantage of such model is to simplify the sample concatenation of sample based synthesizers. In addition, it helps to improve the sound quality of voice transformations in several contexts.
Download On Vibrato and Frequency (De)Modulation in Musical Sounds
Vibrato is an important characteristic in human musical performance and is often uniquely characteristic to a player and/or a particular instrument. This work is motivated by the assumption (often made in the source separation literature) that vibrato aids in the identification of multiple sound sources playing in unison. It follows that its removal, the focus herein, may contribute to a more blended combination. In signals, vibrato is often modeled as an oscillatory deviation from a center pitch/frequency that presents in the sound as phase/frequency modulation. While vibrato implementation using a time-varying delay line is well known, using a delay line for its removal is less so. In this work we focus on (de)modulation of vibrato in a signal by first showing the relationship between modulation and corresponding demodulation delay functions and then suggest a solution for increased vibrato removal in the latter by ensuring sideband attenuation below the threshold of audibility. Two known methods for estimating the instantaneous frequency/phase are used to construct delay functions from both contrived and musical examples so that vibrato removal may be evaluated.
Download Data Augmentation for Instrument Classification Robust to Audio Effects
Reusing recorded sounds (sampling) is a key component in Electronic Music Production (EMP), which has been present since its early days and is at the core of genres like hip-hop or jungle. Commercial and non-commercial services allow users to obtain collections of sounds (sample packs) to reuse in their compositions. Automatic classification of one-shot instrumental sounds allows automatically categorising the sounds contained in these collections, allowing easier navigation and better characterisation. Automatic instrument classification has mostly targeted the classification of unprocessed isolated instrumental sounds or detecting predominant instruments in mixed music tracks. For this classification to be useful in audio databases for EMP, it has to be robust to the audio effects applied to unprocessed sounds. In this paper we evaluate how a state of the art model trained with a large dataset of one-shot instrumental sounds performs when classifying instruments processed with audio effects. In order to evaluate the robustness of the model, we use data augmentation with audio effects and evaluate how each effect influences the classification accuracy.