Download Damped Chirp Mixture Estimation via Nonlinear Bayesian Regression
Estimating mixtures of damped chirp sinusoids in noise is a problem that affects audio analysis, coding, and synthesis applications. Phase-based non-stationary parameter estimators assume that sinusoids can be resolved in the Fourier transform domain, whereas high-resolution methods estimate superimposed components with accuracy close to the theoretical limits, but only for sinusoids with constant frequencies. We present a new method for estimating the parameters of superimposed damped chirps that has an accuracy competitive with existing non-stationary estimators but also has a high-resolution like subspace techniques. After providing the analytical expression for a Gaussian-windowed damped chirp signal’s Fourier transform, we propose an efficient variational EM algorithm for nonlinear Bayesian regression that jointly estimates the amplitudes, phases, frequencies, chirp rates, and decay rates of multiple non-stationary components that may be obfuscated under the same local maximum in the frequency spectrum. Quantitative results show that the new method not only has an estimation accuracy that is close to the Cramér-Rao bound, but also a high resolution that outperforms the state-of-the-art.
Download An Audio-Visual Fusion Piano Transcription Approach Based on Strategy
Piano transcription is a fundamental problem in the field of music information retrieval. At present, a large number of transcriptional studies are mainly based on audio or video, yet there is a small number of discussion based on audio-visual fusion. In this paper, a piano transcription model based on strategy fusion is proposed, in which the transcription results of the video model are used to assist audio transcription. Due to the lack of datasets currently used for audio-visual fusion, the OMAPS data set is proposed in this paper. Meanwhile, our strategy fusion model achieves a 92.07% F1 score on OMAPS dataset. The transcription model based on feature fusion is also compared with the one based on strategy fusion. The experiment results show that the transcription model based on strategy fusion achieves better results than the one based on feature fusion.
Download Spherical Decomposition of Arbitrary Scattering Geometries for Virtual Acoustic Environments
A method is proposed to encode the acoustic scattering of objects for virtual acoustic applications through a multiple-input and multiple-output framework. The scattering is encoded as a matrix in the spherical harmonic domain, and can be re-used and manipulated (rotated, scaled and translated) to synthesize various sound scenes. The proposed method is applied and validated using Boundary Element Method simulations which shows accurate results between references and synthesis. The method is compatible with existing frameworks such as Ambisonics and image source methods.
Download Interacting With Digital Audio Effects Through a Haptic Knob With Programmable Resistance
Live music performances and music production often involve the manipulation of several parameters during sound generation, processing, and mixing. In hardware layouts, those parameters are usually controlled using knobs, sliders and buttons. When these layouts are virtualized, the use of physical (e.g. MIDI) controllers can make interaction easier and reduce the cognitive load associated to sound manipulation. The addition of haptic feedback can further improve such interaction by facilitating the detection of the nature (continuous / discrete) and value of a parameter. To this end, we have realized an endless-knob controller prototype with programmable resistance to rotation, able to render various haptic effects. Ten subjects assessed the effectiveness of the provided haptic feedback in a target-matching task where either visual-only or visual-haptic feedback was provided; the experiment reported significantly lower errors in presence of haptic feedback. Finally, the knob was configured as a multi-parametric controller for a real-time audio effect software written in Python, simulating the voltage-controlled filter aboard the EMS VCS3. The integration of the sound algorithm and the haptic knob is discussed, together with various haptic feedback effects in response to control actions.
Download Quality Diversity for Synthesizer Sound Matching
It is difficult to adjust the parameters of a complex synthesizer to create the desired sound. As such, sound matching, the estimation of synthesis parameters that can replicate a certain sound, is a task that has often been researched, utilizing optimization methods such as genetic algorithm (GA). In this paper, we introduce a novelty-based objective for GA-based sound matching. Our contribution is two-fold. First, we show that the novelty objective is able to improve the quality of sound matching by maintaining phenotypic diversity in the population. Second, we introduce a quality diversity approach to the problem of sound matching, aiming to find a diverse set of matching sounds. We show that the novelty objective is effective in producing high-performing solutions that are diverse in terms of specified audio features. This approach allows for a new way of discovering sounds and exploring the capabilities of a synthesizer.
Download Bio-Inspired Optimization of Parametric Onset Detectors
Onset detectors are used to recognize the beginning of musical events in audio signals. Manual parameter tuning for onset detectors is a time consuming task, while existing automated approaches often maximize only a single performance metric. These automated approaches cannot be used to optimize detector algorithms for complex scenarios, such as real-time onset detection where an optimization process must consider both detection accuracy and latency. For this reason, a flexible optimization algorithm should account for more than one performance metric in a multiobjective manner. This paper presents a generalized procedure for automated optimization of parametric onset detectors. Our procedure employs a bio-inspired evolutionary computation algorithm to replace manual parameter tuning, followed by the computation of the Pareto frontier for multi-objective optimization. The proposed approach was evaluated on all the onset detection methods of the Aubio library, using a dataset of monophonic acoustic guitar recordings. Results show that the proposed solution is effective in reducing the human effort required in the optimization process: it replaced more than two days of manual parameter tuning with 13 hours and 34 minutes of automated computation. Moreover, the resulting performance was comparable to that obtained by manual optimization.
Download Adaptive Pitch-Shifting With Applications to Intonation Adjustment in a Cappella Recordings
A central challenge for a cappella singers is to adjust their intonation and to stay in tune relative to their fellow singers. During editing of a cappella recordings, one may want to adjust local intonation of individual singers or account for global intonation drifts over time. This requires applying a time-varying pitch-shift to the audio recording, which we refer to as adaptive pitch-shifting. In this context, existing (semi-)automatic approaches are either laborintensive or face technical and musical limitations. In this work, we present automatic methods and tools for adaptive pitch-shifting with applications to intonation adjustment in a cappella recordings. To this end, we show how to incorporate time-varying information into existing pitch-shifting algorithms that are based on resampling and time-scale modification (TSM). Furthermore, we release an open-source Python toolbox, which includes a variety of TSM algorithms and an implementation of our method. Finally, we show the potential of our tools by two case studies on global and local intonation adjustment in a cappella recordings using a publicly available multitrack dataset of amateur choral singing.
Download Combining Zeroth and First-Order Analysis With Lagrange Polynomials to Reduce Artefacts in Live Concatenative Granulation
This paper presents a technique addressing signal discontinuity and concatenation artefacts in real-time granular processing with rectangular windowing. By combining zero-crossing synchronicity, first-order derivative analysis, and Lagrange polynomials, we can generate streams of uncorrelated and non-overlapping sonic fragments with minimal low-order derivatives discontinuities. The resulting open-source algorithm, implemented in the Faust language, provides a versatile real-time software for dynamical looping, wavetable oscillation, and granulation with reduced artefacts due to rectangular windowing and no artefacts from overlap-add-to-one techniques commonly deployed in granular processing.
Download The Role of Modal Excitation in Colorless Reverberation
A perceptual study revealing a novel connection between modal properties of feedback delay networks (FDNs) and colorless reverberation is presented. The coloration of the reverberation tail is quantified by the modal excitation distribution derived from the modal decomposition of the FDN. A homogeneously decaying allpass FDN is designed to be colorless such that the corresponding narrow modal excitation distribution leads to a high perceived modal density. Synthetic modal excitation distributions are generated to match modal excitations of FDNs. Three listening tests were conducted to demonstrate the correlation between the modal excitation distribution and the perceived degree of coloration. A fourth test shows a significant reduction of coloration by the colorless FDN compared to other FDN designs. The novel connection of modal excitation, allpass FDNs, and perceived coloration presents a beneficial design criterion for colorless artificial reverberation.
Download Object-Based Synthesis of Scraping and Rolling Sounds Based on Non-Linear Physical Constraints
Sustained contact interactions like scraping and rolling produce a wide variety of sounds. Previous studies have explored ways to synthesize these sounds efficiently and intuitively but could not fully mimic the rich structure of real instances of these sounds. We present a novel source-filter model for realistic synthesis of scraping and rolling sounds with physically and perceptually relevant controllable parameters constrained by principles of mechanics. Key features of our model include non-linearities to constrain the contact force, naturalistic normal force variation for different motions, and a method for morphing impulse responses within a material to achieve location-dependence. Perceptual experiments show that the presented model is able to synthesize realistic scraping and rolling sounds while conveying physical information similar to that in recorded sounds.