Download Separation of musical notes with highly overlapping partials using phase and temporal constrained complex matric factorization
In note separation of polyphonic music, how to separate the overlapping partials is an important and difficult problem. Fifths and octaves, as the most challenging ones, are, however, usually seen in many cases. Non-negative matrix factorization (NMF) employs the constraints of energy and harmonic ratio to tackle this problem. Recently, complex matrix factorization (CMF) is proposed by combining the phase information in source separation problem. However, temporal magnitude modulation is still serious in the situation of fifths and octaves, when CMF is applied. In this work, we investigate the temporal smoothness model based on CMF approach. The temporal ac-tivation coefficient of a preceding note is constrained when the succeeding notes appear. Compare to the unconstraint CMF, the magnitude modulation are greatly reduced in our computer simulation. Performance indices including sourceto-interference ratio (SIR), source-to-artifacts ratio (SAR), sourceto-distortion ratio (SDR), as well as modulation error ratio (MER) are given.
Download Pinna Morphological Parameters Influencing HRTF Sets
Head-Related Transfer Functions (HRTFs) are one of the main aspects of binaural rendering. By definition, these functions express the deep linkage that exists between hearing and morphology especially of the torso, head and ears. Although the perceptive effects of HRTFs is undeniable, the exact influence of the human morphology is still unclear. Its reduction into few anthropometric measurements have led to numerous studies aiming at establishing a ranking of these parameters. However, no consensus has yet been set. In this paper, we study the influence of the anthropometric measurements of the ear, as defined by the CIPIC database, on the HRTFs. This is done through the computation of HRTFs by Fast Multipole Boundary Element Method (FM-BEM) from a parametric model of torso, head and ears. Their variations are measured with 4 different spectral metrics over 4 frequency bands spanning from 0 to 16kHz. Our contribution is the establishment of a ranking of the selected parameters and a comparison to what has already been obtained by the community. Additionally, a discussion over the relevance of each approach is conducted, especially when it relies on the CIPIC data, as well as a discussion over the CIPIC database limitations.
Download Measuring Sensory Consonance by Auditory Modeling
A current model of pitch perception is based on cochlear filtering followed by a periodicity detection. Such a computational model is implemented and then extended to characterise the sensory consonance of pitch intervals. A simple scalar measure of sensory consonance is developed, and to evaluate this perceptually related feature extraction the consonance is computed for musical intervals. The relation of consonance and dissonance to the psychoacoustic notions of roughness and critical bandwidth is discussed.
Download GRAFX: An Open-Source Library for Audio Processing Graphs in Pytorch
We present GRAFX, an open-source library designed for handling audio processing graphs in PyTorch. Along with various library functionalities, we describe technical details on the efficient parallel computation of input graphs, signals, and processor parameters in GPU. Then, we show its example use under a music mixing scenario, where parameters of every differentiable processor in a large graph are optimized via gradient descent. The code is available at https://github.com/sh-lee97/grafx.
Download Sound Morphing by Audio Descriptors and Parameter Interpolation
We present a strategy for static morphing that relies on the sophisticated interpolation of the parameters of the signal model and the independent control of high-level audio features. The source and target signals are decomposed into deterministic, quasi-deterministic and stochastic parts, and are processed separately according to sinusoidal modeling and spectral envelope estimation. We gain further intuitive control over the morphing process by altering the interpolated spectrum according to target values of audio descriptors through an optimization process. The proposed approach leads to convincing morphing results in the case of sustained or percussive, harmonic and inharmonic sounds of possibly different durations.
Download Investigation of a Drum Controlled Cross-adaptive Audio Effect for Live Performance
Electronic music often uses dynamic and synchronized digital audio effects that cannot easily be recreated in live performances. Cross-adaptive effects provide a simple solution to such problems since they can use multiple feature inputs to control dynamic variables in real time. We propose a generic scheme for cross-adaptive effects where onset detection on a drum track dynamically triggers effects on other tracks. This allows a percussionist to orchestrate effects across multiple instruments during performance. We describe the general structure that includes an onset detection and feature extraction algorithm, envelope and LFO synchronization, and an interface that enables the user to associate different effects to be triggered depending on the cue from the percussionist. Subjective evaluation is performed based on use in live performance. Implications on music composition and performance are also discussed. Keywords: Cross-adaptive digital audio effects, live processing, real-time control, Csound.
Download Real-time Pitch Tracking in Audio Signals with the Extended Complex Kalman Filter
The Kalman filter is a well-known tool used extensively in robotics, navigation, speech enhancement and finance. In this paper, we propose a novel pitch follower based on the Extended Complex Kalman Filter (ECKF). An advantage of this pitch follower is that it operates on a sample-by-sample basis, unlike other block-based algorithms that are most commonly used in pitch estimation. Thus, it estimates sample-synchronous fundamental frequency (assumed to be the perceived pitch), which makes it ideal for real-time implementation. Simultaneously, the ECKF also tracks the amplitude envelope of the input audio signal. Finally, we test our ECKF pitch detector on a number of cello and double bass recordings played with various ornaments, such as vibrato, portamento and trill, and compare its result with the well-known YIN estimator, to conclude the effectiveness of our algorithm.
Download Statistical Sinusoidal Modeling for Expressive Sound Synthesis
Statistical sinusoidal modeling represents a method for transferring a sample library of instrument sounds into a data base of sinusoidal parameters for the use in real time additive synthesis. Single sounds, capturing an instrument in combinations of pitch and intensity, are therefor segmented into attack, sustain and release. Partial amplitudes, frequencies and Bark band energies are calculated for all sounds and segments. For the sustain part, all partial and noise parameters are transformed to probabilistic distributions. Interpolated inverse transform sampling is introduced for generating parameter trajectories during synthesis in real time, allowing the creation of sounds located at pitches and intensities between the actual support points of the sample library. Evaluation is performed by qualitative analysis of the system response to sweeps of the control parameters pitch and intensity. Results for a set of violin samples demonstrate the ability of the approach to model dynamic timbre changes, which is crucial for the perceived quality of expressive sound synthesis.
Download A Differentiable Digital Moog Filter For Machine Learning Applications
In this project, a digital ladder filter has been investigated and expanded. This structure is a simplified digital analog model of the well known analog Moog ladder filter. The goal of this paper is to derive the differentiation expressions of this filter with respect to its control parameters in order to integrate it in machine learning systems. The derivation of the backpropagation method is described in this work, it can be generalized to a Moog filter or a similar filter having any number of stages. Subsequently, the example of an adaptive Moog filter is provided. Finally, a machine learning application example is shown where the filter is integrated in a deep learning framework.
Download Vivos Voco: A survey of recent research on voice transformations at IRCAM
IRCAM has a long experience in analysis, synthesis and transformation of voice. Natural voice transformations are of great interest for many applications and can be combine with text-to-speech system, leading to a powerful creation tool. We present research conducted at IRCAM on voice transformations for the last few years. Transformations can be achieved in a global way by modifying pitch, spectral envelope, durations etc. While it sacrifices the possibility to attain a specific target voice, the approach allows the production of new voices of a high degree of naturalness with different gender and age, modified vocal quality, or another speech style. These transformations can be applied in realtime using ircamTools TR A X.Transformation can also be done in a more specific way in order to transform a voice towards the voice of a target speaker. Finally, we present some recent research on the transformation of expressivity.