Download Local Key estimation Based on Harmonic and Metric Structures
In this paper, we present a method for estimating the local keys of an audio signal. We propose to address the problem of local key finding by investigating the possible combination and extension of different previous proposed global key estimation approaches. The specificity of our approach is that we introduce key dependency on the harmonic and the metric structures. In this work, we focus on the relationship between the chord progression and the local key progression in a piece of music. A contribution of our work is that we address the problem of finding a good analysis window length for local key estimation by introducing information related to the metric structure in our model. Key estimation is not performed on empirical-chosen segment length but on segments that are adapted to the analyzed piece and independent from the tempo. We evaluate and analyze our results on a new database composed of classical music pieces.
Download The Restoration of Single Channel Audio Recordings Based on Non-Negative Matrix Factorization and Perceptual Suppression Rule
In this paper, we focus on the signal-to-noise ratio (SNR) improvement in single channel audio recordings. Many approaches have been reported in the literature. The most popular method, with many variants, is Short Time Spectral Attenuation (STSA). Although this method reduces the noise and improves the SNR, it mostly tends to introduce signal distortion and a perceptually annoying residual noise usually called musical noise. In this paper we investigate the use of Non-negative Matrix Factorization (NMF) as an alternative to the STSA for the digital curation of musical heritage. NMF is an emerging new technique in the blind extraction of signals recorded in a variety of different fields. The application of NMF to the analysis of monaural recordings is relatively recent. We show that NMF is a suitable technique to extract the clean audio signal from undesired non stationary noise in a monaural recording of ethnic music. More specifically, we introduce a perceptual suppression rule to determine how the perceptual domain is competitive compared to the acoustic domain. Moreover, we carry out a listening test in order to compare NMF with the state of the art audio restoration framework using the EBU MUSHRA test method. The encouraging results obtained with this methodology in the presented case study support their wider applicability in audio separation.
Download Granular analysis/synthesis of percussive drilling sounds
This paper deals with the automatic and robust analysis, and the realistic and low-cost synthesis of percussive drilling like sounds. The two contributions are: a non-supervised removal of quasistationary background noise based on the Non-negative Matrix Factorization, and a granular method for analysis/synthesis of this drilling sounds. These two points are appropriate to the acoustical properties of percussive drilling sounds, and can be extended to other sounds with similar characteristics. The context of this work is the training of operators of working machines using simulators. Additionally, an implementation is explained.
Download Spatialized audio in a vision rehabilitation game for training orientation and mobility skills
Serious games can be used for training orientation and mobility skills of visually impaired children and youngsters. Here we present a serious game for training sound localization skills and concepts usually covered at orientation and mobility classes, such as front/back and left/right. In addition, the game helps the players to train simple body rotation mobility skills. The game was designed for touch screen mobile devices and has an audio virtual environment created with 3D spatialized audio obtained with head-related transfer functions. The results from a usability test with blind students show that the game can have a positive impact on the players’ skills, namely on their motor coordination and localization skills, as well as on their self-confidence.
Download Universal Audio Synthesizer Control with Normalizing Flows
The ubiquity of sound synthesizers have reshaped music production and even entirely define new music genres. However, the increasing complexity and number of parameters in modern synthesizers make them harder to master. Hence, the development of methods allowing to easily create and explore with synthesizers is a crucial need. Here, we introduce a radically novel formulation of audio synthesizer control by formalizing it as finding an organized continuous latent space of audio that represents the capabilities of a synthesizer and map this space to the space of synthesis parameter. By using this formulation, we show that we can address simultaneously automatic parameter inference, macro-control learning and audio-based preset exploration within a single model. To solve this new formulation, we rely on Variational Auto-Encoders (VAE) and Normalizing Flows (NF) to organize and map the respective auditory and parameter spaces. We introduce a new type of NF named regression flows that allow to perform an invertible mapping between separate latent spaces, while steering the organization of some of the latent dimensions. We evaluate our proposal against a large set of baseline models and show its superiority in both parameter inference and audio reconstruction. We also show that the model disentangles the major factors of audio variations as latent dimensions, that can be directly used as macro-parameters. Finally, we discuss the use of our model in several creative applications and introduce real-time implementations in Ableton Live
Download NMF Toolbox: Music Processing Applications of Nonnegative Matrix Factorization
Nonnegative matrix factorization (NMF) is a family of methods widely used for information retrieval across domains including text, images, and audio. Within music processing, NMF has been used for tasks such as transcription, source separation, and structure analysis. Prior work has shown that initialization and constrained update rules can drastically improve the chances of NMF converging to a musically meaningful solution. Along these lines we present the NMF toolbox, containing MATLAB and Python implementations of conceptually distinct NMF variants—in particular, this paper gives an overview for two algorithms. The first variant, called nonnegative matrix factor deconvolution (NMFD), extends the original NMF algorithm to the convolutive case, enforcing the temporal order of spectral templates. The second variant, called diagonal NMF, supports the development of sparse diagonal structures in the activation matrix. Our toolbox contains several demo applications and code examples to illustrate its potential and functionality. By providing MATLAB and Python code on a documentation website under a GNU-GPL license, as well as including illustrative examples, our aim is to foster research and education in the field of music processing.
Download Real-Time Modal Synthesis of Crash Cymbals with Nonlinear Approximations, Using a GPU
We apply modal synthesis to create a virtual collection of crash cymbals. Synthesizing each cymbal may require enough modes to stress a modern CPU, so a full drum set would certainly not be tractable in real-time. To work around this, we create a GPU-accelerated modal filterbank, with each individual set piece allocated over two thousand modes. This takes only a fraction of available GPU floating-point throughput. With CPU resources freed up, we explore methods to model the different instrument response in the linear/harmonic and non-linear/inharmonic regions that occur as more energy is present in a cymbal: a simple approach, yet one that preserves the parallelism of the problem, uses multisampling, and a more physically-based approach approximates modal coupling.
Download Damped Chirp Mixture Estimation via Nonlinear Bayesian Regression
Estimating mixtures of damped chirp sinusoids in noise is a problem that affects audio analysis, coding, and synthesis applications. Phase-based non-stationary parameter estimators assume that sinusoids can be resolved in the Fourier transform domain, whereas high-resolution methods estimate superimposed components with accuracy close to the theoretical limits, but only for sinusoids with constant frequencies. We present a new method for estimating the parameters of superimposed damped chirps that has an accuracy competitive with existing non-stationary estimators but also has a high-resolution like subspace techniques. After providing the analytical expression for a Gaussian-windowed damped chirp signal’s Fourier transform, we propose an efficient variational EM algorithm for nonlinear Bayesian regression that jointly estimates the amplitudes, phases, frequencies, chirp rates, and decay rates of multiple non-stationary components that may be obfuscated under the same local maximum in the frequency spectrum. Quantitative results show that the new method not only has an estimation accuracy that is close to the Cramér-Rao bound, but also a high resolution that outperforms the state-of-the-art.
Download Differentiable Feedback Delay Network for Colorless Reverberation
Artificial reverberation algorithms often suffer from spectral coloration, usually in the form of metallic ringing, which impairs the perceived quality of sound. This paper proposes a method to reduce the coloration in the feedback delay network (FDN), a popular artificial reverberation algorithm. An optimization framework is employed entailing a differentiable FDN to learn a set of parameters decreasing coloration. The optimization objective is to minimize the spectral loss to obtain a flat magnitude response, with an additional temporal loss term to control the sparseness of the impulse response. The objective evaluation of the method shows a favorable narrower distribution of modal excitation while retaining the impulse response density. The subjective evaluation demonstrates that the proposed method lowers perceptual coloration of late reverberation, and also shows that the suggested optimization improves sound quality for small FDN sizes. The method proposed in this work constitutes an improvement in the design of accurate and high-quality artificial reverberation, simultaneously offering computational savings.
Download Wave Digital Model of the MXR Phase 90 Based on a Time-Varying Resistor Approximation of JFET Elements
Virtual Analog (VA) modeling is the practice of digitally emulating analog audio gear. Over the past few years, with the purpose of recreating the alleged distinctive sound of audio equipment and musicians, many different guitar pedals have been emulated by means of the VA paradigm but little attention has been given to phasers. Phasers process the spectrum of the input signal with time-varying notches by means of shifting stages typically realized with a network of transistors, whose nonlinear equations are, in general, demanding to be solved. In this paper, we take as a reference the famous MXR Phase 90 guitar pedal, and we propose an efficient time-varying model of its Junction Field-Effect Transistors (JFETs) based on a channel resistance approximation. We then employ such a model in the Wave Digital domain to emulate in real-time the guitar pedal, obtaining an implementation characterized by low computational cost and good accuracy.