Download A system for data-driven concatenative sound synthesis
In speech synthesis, concatenative data-driven synthesis methods prevail. They use a database of recorded speech and a unit selection algorithm that selects the segments that match best the utterance to be synthesized. Transferring these ideas to musical sound synthesis allows a new method of high quality sound synthesis. Usual synthesis methods are based on a model of the sound signal. It is very difficult to build a model that would preserve the entire fine details of sound. Concatenative synthesis achieves this by using actual recordings. This data-driven approach (as opposed to a rule-based approach) takes advantage of the information contained in the many sound recordings. For example, very naturally sounding transitions can be synthesized, since unit selection is aware of the context of the database units. The C ATERPILLAR software system has been developed to allow data-driven concatenative unit selection sound synthesis. It allows high-quality instrument synthesis with high level control, explorative free synthesis from arbitrary sound databases, or resynthesis of a recording with sounds from the database. It is based on the new software-engineering concept of component-oriented software, increasing flexibility and facilitating reuse.
Download Interactive digital audio environments: gesture as a musical parameter
This paper presents some possible relationships between gesture and sound that may be built with an interactive digital audio environment. In a traditional musical situation gesture usually produces sound. The relationship between gesture and sound is unique, it is a cause to effect link. In computer music, the possibility of uncoupling gesture from sound is due to the fact that computer can carry out all the aspects of sound production from composition up to interpretation and performance. Real time computing technology and development of human gesture tracking systems may enable gesture to be introduced again into the practice of computer music, but with a completely renewed approach. There is no more need to create direct cause to effect relationships for sound production, and gesture may be seen as another musical parameter to play with in the context of interactive musical performances.
Download Audio Processing Using Haskell
The software for most today’s applications including signal processing applications is written in imperative languages. Imperative programs are fast because they are designed close to the architecture of the widespread computers, but they don’t match the structure of signal processing very well. In contrast to that, functional programming and especially lazy evaluation perfectly models many common operations on signals. Haskell is a statically typed, lazy functional programming language which allow for a very elegant and concise programming style. We want to sketch how to process signals, how to improve safety by the use of physical units, and how to compose music using this language.
Download Towards Transient Restoration in Score-informed Audio Decomposition
Our goal is to improve the perceptual quality of transient signal components extracted in the context of music source separation. Many state-of-the-art techniques are based on applying a suitable decomposition to the magnitude of the Short-Time Fourier Transform (STFT) of the mixture signal. The phase information required for the reconstruction of individual component signals is usually taken from the mixture, resulting in a complex-valued, modified STFT (MSTFT). There are different methods for reconstructing a time-domain signal whose STFT approximates the target MSTFT. Due to phase inconsistencies, these reconstructed signals are likely to contain artifacts such as pre-echos preceding transient components. In this paper, we propose a simple, yet effective extension of the iterative signal reconstruction procedure by Griffin and Lim to remedy this problem. In a first experiment, under laboratory conditions, we show that our method considerably attenuates pre-echos while still showing similar convergence properties as the original approach. A second, more realistic experiment involving score-informed audio decomposition shows that the proposed method still yields improvements, although to a lesser extent, under non-idealized conditions.
Download A Nonlinear Method for Manipulating Warmth and Brightness
In musical timbre, two of the most commonly used perceptual dimensions are warmth and brightness. In this study, we develop a model capable of accurately controlling the warmth and brightness of an audio source using a single parameter. To do this, we first identify the most salient audio features associated with the chosen descriptors by applying dimensionality reduction to a dataset of annotated timbral transformations. Here, strong positive correlations are found between the centroid of various spectral representations and the most salient principal components. From this, we build a system designed to manipulate the audio features directly using a combination of linear and nonlinear processing modules. To validate the model, we conduct a series of subjective listening tests, and show that up to 80% of participants are able to allocate the correct term, or synonyms thereof, to a set of processed audio samples. Objectively, we show low Mahalanobis distances between the processed samples and clusters of the same timbral adjective in the low-dimensional timbre space.
Download A Feedback Canceling Reverberator
A real-time auralization system is described in which room sounds are reverberated and presented over loudspeakers. Room microphones are used to capture room sound sources, with their outputs processed in a canceler to remove the synthetic reverberation also present in the room. Doing so suppresses feedback and gives precise control over the auralization. It also allows freedom of movement and creates a more dynamic acoustic environment for performers or participants in music, theater, gaming, and virtual reality applications. Canceler design methods are discussed, including techniques for handling varying loudspeaker-microphone transfer functions such as would be present in the context of a performance or installation. Tests in a listening room and recital hall show in excess of 20 dB of feedback suppression.
Download An Efficient Phasiness Reduction Technique for Moderate Audio Time-Scale Modification
Phase vocoder approaches to time-scale modification of audio introduce a reverberant/phasy artifact into the time-scaled output due to a loss in phase coherence between short-time Fourier transform (STFT) bins. Recent improvements to the phase vocoder have reduced the presence of this artifact, however, it remains a problem. A method of time-scaling is presented that results in a further reduction in phasiness, for moderate time-scale factors, by taking advantage of some flexibility that exists in the choice of phase required so as to maintain horizontal phase coherence between related STFT bins. Furthermore, the approach leads to a reduction in computational load within the range of time-scaling factors for which phasiness is reduced.
Download Visualaudio-Design – Towards a Graphical Sounddesign
VisualAudio-Design (VAD) is a spectral-node based approach to visually design audio collages and sounds. The spectrogram as a visualization of the frequency-domain can be intuitively manipulated with tools known from image processing. Thereby, a more comprehensible sound design is described to address common abstract interfaces for DSP algorithms that still use direct value inputs, sliders, or knobs. In addition to interaction in the timedomain of audio and conventional analysis and restoration tasks, there are many new possibilities for spectral manipulation of audio material. Here, affine transformations and two-dimensional convolution filters are proposed.
Download Nicht-negativeMatrixFaktorisierungnutzendes-KlangsynthesenSystem (NiMFKS): Extensions of NMF-based Concatenative Sound Synthesis
Concatenative sound synthesis (CSS) entails synthesising a “target” sound with other sounds collected in a “corpus.” Recent work explores CSS using non-negative matrix factorisation (NMF) to approximate a target sonogram by the product of a corpus sonogram and an activation matrix. In this paper, we propose a number of extensions of NMF-based CSS and present an open MATLAB implementation in a GUI-based application we name NiMFKS. In particular we consider the following extensions: 1) we extend the NMF framework by implementing update rules based on the generalised β-divergence; 2) We add an optional monotonic algorithm for sparse-NMF; 3) we tackle the computational challenges of scaling to big corpora by implementing a corpus pruning preprocessing step; 4) we generalise constraints that may be applied to the activation matrix shape; and 5) we implement new modes of interacting with the procedure by enabling sketching and modifying of the activation matrix. Our application, NiMFKS and source code can be downloaded from here: https: //code.soundsoftware.ac.uk/projects/nimfks.
Download Two polarisation finite difference model of bowed strings with nonlinear contact and friction forces
Recent bowed string sound synthesis has relied on physical modelling techniques; the achievable realism and flexibility of gestural control are appealing, and the heavier computational cost becomes less significant as technology improves. A bowed string is simulated in two polarisations by discretising the partial differential equations governing its behaviour, using the finite difference method; a globally energy balanced scheme is used, as a guarantee of numerical stability under highly nonlinear conditions. In one polarisation, a nonlinear contact model is used for the normal forces exerted by the dynamic bow hair, left hand fingers, and fingerboard. In the other polarisation, a force-velocity friction curve is used for the resulting tangential forces. The scheme update requires the solution of two nonlinear vector equations.Sound examples and video demonstrations are presented.