Download Application of non-negative matrix factorization to signal-adaptive audio effects This paper proposes novel audio effects based on manipulating an audio signal in a representation domain provided by non-negative matrix factorization (NMF). Critical-band magnitude spectrograms Y of sounds are first factorized into a product of two lower-rank matrices so that Y ≈ BG. The parameter matrices B and G are then processed in order to achieve the desired effect. Three classes of effects were investigated: 1) dynamic range compression (or expansion) of the component spectra or gains, 2) effects based on rank-ordering the components (colums of B and the corresponding rows of G) according to acoustic features extracted from them, and then weighting each component according to its rank, and 3) distortion effects based on controlling the amount of components (and thus the reconstruction error) in the above linear approximation. The subjective quality of the effects was assessed in a listening test.
Download DHM and FDTD based Hardware Sound Field Simulation Acceleration Sound field simulation is widely used for acoustic design; however, this simulation needs many computational resources. On the other hand, FPGA becomes major for acceleration. To take advantage of hardware acceleration by FPGA, hardware oriented algorithm which consumes small number of gates and memory is necessary. This paper addresses hardware acceleration of sound field simulation using FPGA. Improved Digital Huygens Model (DHM) for hardware is implemented and speed up ratio is examined. For 2D simulation, the implemented accelerator is 1,170 times faster than software simulation. For 3D simulation, it is shown that FDTD based method is suitable for hardware implementation and required hardware resource are estimated.
Download Towards Ontological Representations of Digital Audio Effects In this paper we discuss the development of ontological representations of digital audio effects and provide a framework for the description of digital audio effects and audio effect transformations. After a brief account on our current research in the field of highlevel semantics for music production using Semantic Web technologies, we detail how an Audio Effects Ontology can be used within the context of intelligent music production tools, as well as for musicological purposes. Furthermore, we discuss problems in the design of such an ontology arising from discipline-specific classifications, such as the need for encoding different taxonomical systems based on, for instance, implementation techniques or perceptual attributes of audio effects. Finally, we show how information about audio effect transformations is represented using Semantic Web technologies, the Resource Description framework (RDF) and retrieved using the SPARQL query language.
Download Vivos Voco: A survey of recent research on voice transformations at IRCAM IRCAM has a long experience in analysis, synthesis and transformation of voice. Natural voice transformations are of great interest for many applications and can be combine with text-to-speech system, leading to a powerful creation tool. We present research conducted at IRCAM on voice transformations for the last few years. Transformations can be achieved in a global way by modifying pitch, spectral envelope, durations etc. While it sacrifices the possibility to attain a specific target voice, the approach allows the production of new voices of a high degree of naturalness with different gender and age, modified vocal quality, or another speech style. These transformations can be applied in realtime using ircamTools TR A X.Transformation can also be done in a more specific way in order to transform a voice towards the voice of a target speaker. Finally, we present some recent research on the transformation of expressivity.
Download A Sound Localization based Interface for Real-Time Control of Audio Processing This paper describes the implementation of an innovative musical interface based on the sound localization capability of a microphone array. Our proposal is to allow a musician to plan and conduct the expressivity of a performance, by controlling in realtime an audio processing module through the spatial movement of a sound source, i.e. voice, traditional musical instruments, sounding mobile devices. The proposed interface is able to locate and track the sound in a two-dimensional space with accuracy, so that the x-y coordinates of the sound source can be used to control the processing parameters. In particular, the paper is focused on the localization and tracking of harmonic sound sources in real moderate reverberant and noisy environment. To this purpose, we designed a system based on adaptive parameterized Generalized Cross-Correlation (GCC) and Phase Transform (PHAT) weighting with Zero-Crossing Rate (ZCR) threshold, a Wiener filter to improve the Signal to Noise Ratio (SNR) and a Kalman filter to make the position estimation more robust and accurate. We developed a Max/MSP external objects to test the system in a real scenario and to validate its usability.
Download Lyapunov Stability Analysis of the Moog Ladder Filter and Dissipativity Aspects in Numerical Solutions This paper investigates the passivity of the Moog Ladder Filter and its simulation. First, the linearized system is analyzed. Results based on the energy stored in the capacitors lead to a stability domain which is available for time-varying control parameters meanwhile it is sub-optimal for time-invariant ones. A second storage function is proposed, from which the largest stability domain is recovered for a time-invariant Q-parameter. Sufficient conditions for stability are given. Second, the study is adapted to the nonlinear case by introducing a third storage function. Then, a simulation based on the standard bilinear transform is derived and the dissipativity of this numerical version is examined. Simulations show that passivity is not unconditionally guaranteed, but mostly fulfilled, and that typical behaviours of the Moog filter, including self-oscillations, are properly reproduced.
Download Efficient Polynomial Implementation of the EMS VCS3 Filter Model A previously existing nonlinear differential equation system modeling the EMS VCS3 voltage controlled filter is reformulated here in polynomial form, avoiding the expensive computation of transcendent functions imposed by the original model. The new system is discretized by means of an implicit numerical scheme, and solved using Newton-Raphson iterations. While maintaining instantaneous controllability, the algorithm is both significantly faster and more accurate than the previous filter-based solution. A real time version of the model has been implemented under the PureData audio processing environment and as a VST plugin.
Download Analysis and Trans-synthesis of Acoustic Bowed-String Instrument Recordings: a Case Study using Bach Cello Suites In this paper, analysis and trans-synthesis of acoustic bowed string instrument recordings with new non-negative matrix factorization (NMF) procedure are presented. This work shows that it may require more than one template to represent a note according to time-varying behavior of timbre, especially played by bowed string instruments. The proposed method improves original NMF without the knowledge of tone models and the number of required templates in advance. Resultant NMF information is then converted into the synthesis parameters of the sinusoidal synthesis. Bach cello suites recorded by Fournier and Starker are used in the experiments. Analysis and trans-synthesis examples of the recordings are also provided. Index Terms—trans-synthesis, non-negative matrix factorization, bowed string instrument
Download Identification of Time-frequency Maps for sounds timbre discrimination Gabor Multipliers are signals operator which are diagonal in a time-frequency representation of signals and can be viewed as timefrequency transfer function. If we estimate a Gabor mask between a note played by two instruments, then we have a time-frequency representation of the difference of timbre between these two notes. By averaging the energy contained in the Gabor mask, we obtain a measure of this difference. In this context, our goal is to automatically localize the time-frequency regions responsible for such a timbre dissimilarity. This problem is addressed as a feature selection problem over the time-frequency coefficients of a labelled data set of sounds.
Download Modeling of the Carbon Microphone Nonlinearity for a Vintage Telephone Sound Effect The telephone sound effect is widely used in music, television and the film industry. This paper presents a digital model of the carbon microphone nonlinearity which can be used to produce a vintage telephone sound effect. The model is constructed based on measurements taken from a real carbon microphone. The proposed model is a modified version of the sandwich model previously used for nonlinear telephone handset modeling. Each distortion component can be modeled individually based on the desired features. The computational efficiency can be increased by lumping the spectral processing of the individual distortion components together. The model incorporates a filtered noise source to model the self-induced noise generated by the carbon microphones. The model has also an input level depended noise generator for additional sound quality degradation. The proposed model can be used in various ways in the digital modeling of the vintage telephone sound.