Download Enhanced Beat Tracking with Context-Aware Neural Networks
We present two new beat tracking algorithms based on the autocorrelation analysis, which showed state-of-the-art performance in the MIREX 2010 beat tracking contest. Unlike the traditional approach of processing a list of onsets, we propose to use a bidirectional Long Short-Term Memory recurrent neural network to perform a frame by frame beat classification of the signal. As inputs to the network the spectral features of the audio signal and their relative differences are used. The network transforms the signal directly into a beat activation function. An autocorrelation function is then used to determine the predominant tempo to eliminate the erroneously detected - or complement the missing - beats. The first algorithm is tuned for music with constant tempo, whereas the second algorithm is further capable to follow changes in tempo and time signature.
Download Mapping blowing pressure and sound features in recorder playing
This paper presents a data-driven approach to the construction of mapping models relating sound features and blowing pressure in recorder playing. Blowing pressure and sound feature data are synchronously obtained from real performance: blowing pressure is measured by means of a piezoelectric transducer inserted into the mouth piece of a modified recorder, while produced sound is acquired using a close-field microphone. Acquired sound is analyzed frame-by-frame, and features are extracted so that original sound can be reconstructed with enough fidelity. A multi-modal database of aligned blowing pressure and sound feature signals is constructed from real performance recordings designed to cover basic performance contexts. Out of the gathered data, two types of mapping models are constructed using artificial neural networks: (i) a model able to generate sound feature signals from blowing pressure signals, and therefore used to produce synthetic sound from recorded blowing pressure profiles via additive synthesis; and (ii) a model able to estimate the blowing pressure from extracted sound features.
Download Vivos Voco: A survey of recent research on voice transformations at IRCAM
IRCAM has a long experience in analysis, synthesis and transformation of voice. Natural voice transformations are of great interest for many applications and can be combine with text-to-speech system, leading to a powerful creation tool. We present research conducted at IRCAM on voice transformations for the last few years. Transformations can be achieved in a global way by modifying pitch, spectral envelope, durations etc. While it sacrifices the possibility to attain a specific target voice, the approach allows the production of new voices of a high degree of naturalness with different gender and age, modified vocal quality, or another speech style. These transformations can be applied in realtime using ircamTools TR A X.Transformation can also be done in a more specific way in order to transform a voice towards the voice of a target speaker. Finally, we present some recent research on the transformation of expressivity.
Download Analysis and Trans-synthesis of Acoustic Bowed-String Instrument Recordings: a Case Study using Bach Cello Suites
In this paper, analysis and trans-synthesis of acoustic bowed string instrument recordings with new non-negative matrix factorization (NMF) procedure are presented. This work shows that it may require more than one template to represent a note according to time-varying behavior of timbre, especially played by bowed string instruments. The proposed method improves original NMF without the knowledge of tone models and the number of required templates in advance. Resultant NMF information is then converted into the synthesis parameters of the sinusoidal synthesis. Bach cello suites recorded by Fournier and Starker are used in the experiments. Analysis and trans-synthesis examples of the recordings are also provided. Index Terms—trans-synthesis, non-negative matrix factorization, bowed string instrument
Download Application of non-negative matrix factorization to signal-adaptive audio effects
This paper proposes novel audio effects based on manipulating an audio signal in a representation domain provided by non-negative matrix factorization (NMF). Critical-band magnitude spectrograms Y of sounds are first factorized into a product of two lower-rank matrices so that Y ≈ BG. The parameter matrices B and G are then processed in order to achieve the desired effect. Three classes of effects were investigated: 1) dynamic range compression (or expansion) of the component spectra or gains, 2) effects based on rank-ordering the components (colums of B and the corresponding rows of G) according to acoustic features extracted from them, and then weighting each component according to its rank, and 3) distortion effects based on controlling the amount of components (and thus the reconstruction error) in the above linear approximation. The subjective quality of the effects was assessed in a listening test.
Download Realtime system for backing vocal harmonization
A system for the synthesis of backing vocals by pitch shifting of a lead vocal signal is presented. The harmonization of the backing vocals is based on the chords which are retrieved from an accompanying instrument. The system operates completely autonomous without the need to provide the key of the performed song. This simplifies the handling of the harmonization effect. The system is designed to have realtime capability to be used as live sound effect.
Download Sparse Atomic Modeling of Audio: a Review
Research into sparse atomic models has recently intensified in the image and audio processing communities. While other reviews exist, we believe this paper provides a good starting point for the uninitiated reader as it concisely summarizes the state-of-the-art, and presents most of the major topics in an accessible manner. We discuss several approaches to the sparse approximation problem including various greedy algorithms, iteratively re-weighted least squares, iterative shrinkage, and Bayesian methods. We provide pseudo-code for several of the algorithms, and have released software which includes fast dictionaries and reference implementations for many of the algorithms. We discuss the relevance of the different approaches for audio applications, and include numerical comparisons. We also illustrate several audio applications of sparse atomic modeling.
Download Generation of Non-repetitive Everyday Impact Sounds for Interactive Applications
The use of high quality sound effects is growing rapidly in multimedia, interactive and virtual reality applications. The common source of audio events in these applications is impact sounds. The sound effects in such environments can be pre-recorded or synthesized in real-time as a result of a physical event. However, one of the biggest problems when using pre-recorded sound effects is the monotonous repetition of these sounds which can be tedious to the listener. In this paper, we present a new algorithm which generates non-repetitive impact sound effects using parameters from the physical interaction. Our approach aims to use audio grains to create finely-controlled synthesized sounds which are based on recordings of impact sounds. The proposed algorithm can also be used in a large set of audio data analysis, representation, and compression applications. A subjective test was carried out to evaluate the perceptual quality of the synthesized sounds.
Download Combining classifications based on local and global features: application to singer identification
In this paper we investigate the problem of singer identification on acapella recordings of isolated notes. Most of studies on singer identification describe the content of signals of singing voice with features related to the timbre (such as MFCC or LPC). These features aim to describe the behavior of frequencies at a given instant of time (local features). In this paper, we propose to describe sung tone with the temporal variations of the fundamental frequency (and its harmonics) of the note. The periodic and continuous variations of the frequency trajectories are analyzed on the whole note and the features obtained reflect expressive and intonative elements of singing such as vibrato, tremolo and portamento. The experiments, conducted on two distinct data-sets (lyric and pop-rock singers), prove that the new set of features capture a part of the singer identity. However, these features are less accurate than timbre-based features. We propose to increase the recognition rate of singer identification by combining information conveyed by local and global description of notes. The proposed method, that shows good results, can be adapted for classification problem involving a large number of classes, or to combine classifications with different levels of performance.
Download State of the Art in Sound Texture Synthesis
The synthesis of sound textures, such as rain, wind, or crowds, is an important application for cinema, multimedia creation, games and installations. However, despite the clearly defined requirments of naturalness and flexibility, no automatic method has yet found widespread use. After clarifying the definition, terminology, and usages of sound texture synthesis, we will give an overview of the many existing methods and approaches, and the few available software implementations, and classify them by the synthesis model they are based on, such as subtractive or additive synthesis, granular synthesis, corpus-based concatenative synthesis, wavelets, or physical modeling. Additionally, an overview is given over analysis methods used for sound texture synthesis, such as segmentation, statistical modeling, timbral analysis, and modeling of transitions. 2