Download Audio Morphing Using Matrix Decomposition and Optimal Transport
This paper presents a system for morphing between audio recordings in a continuous parameter space. The proposed approach combines matrix decompositions used for audio source separation with displacement interpolation enabled by 1D optimal transport. By interpolating the spectral components obtained using nonnegative matrix factorization of the source and target signals, the system allows varying the timbre of a sound in real time, while maintaining its temporal structure. Using harmonic / percussive source separation as a pre-processing step, the system affords more detailed control of the interpolation in perceptually meaningful dimensions.
Download Sitrano: A Matlab App for Sines-Transients-Noise Decomposition of Audio Signals
Decomposition of sounds into their sinusoidal, transient, and noise components is an active research topic and a widely-used tool in audio processing. Multiple solutions have been proposed in recent years, using time–frequency representations to identify either horizontal and vertical structures or orientations and anisotropy in the spectrogram of the sound. In this paper, we present SiTraNo: an easy-to-use MATLAB application with a graphic user interface for audio decomposition that enables visualization and access to the sinusoidal, transient, and noise classes, individually. This application allows the user to choose between different well-known separation methods to analyze an input sound file, to instantaneously control and remix its spectral components, and to visually check the quality of the separation, before producing the desired output file. The visualization of common artifacts, such as birdies and dropouts, is demonstrated. This application promotes experimenting with the sound decomposition process by observing the effect of variations for each spectral component on the original sound and by comparing different methods against each other, evaluating the separation quality both audibly and visually. SiTraNo and its source code are available on a companion website and repository.
Download One-to-Many Conversion for Percussive Samples
A filtering algorithm for generating subtle random variations in sampled sounds is proposed. Using only one recording for impact sound effects or drum machine sounds results in unrealistic repetitiveness during consecutive playback. This paper studies spectral variations in repeated knocking sounds and in three drum sounds: a hihat, a snare, and a tomtom. The proposed method uses a short pseudo-random velvet-noise filter and a low-shelf filter to produce timbral variations targeted at appropriate spectral regions, yielding potentially an endless number of new realistic versions of a single percussive sampled sound. The realism of the resulting processed sounds is studied in a listening test. The results show that the sound quality obtained with the proposed algorithm is at least as good as that of a previous method while using 77% fewer computational operations. The algorithm is widely applicable to computer-generated music and game audio.
Download Alloy Sounds: Non-Repeating Sound Textures With Probabilistic Cellular Automata
Contemporary musicians commonly face the challenge of finding new, characteristic sounds that can make their compositions more distinct. They often resort to computers and algorithms, which can significantly aid in creative processes by generating unexpected material in controlled probabilistic processes. In particular, algorithms that present emergent behaviors, like genetic algorithms and cellular automata, have fostered a broad diversity of musical explorations. This article proposes an original technique for the computer-assisted creation and manipulation of sound textures. The technique uses Probabilistic Cellular Automata, which are yet seldom explored in the music domain, to blend two audio tracks into a third, different one. The proposed blending process works by dividing the source tracks into frequency bands and then associating each of the automaton’s cell to a frequency band. Only one source, chosen by the cell’s state, is active within each band. The resulting track has a non-repeating textural pattern that follows the changes in the Cellular Automata. This blending process allows the musician to choose the original material and the blend granularity, significantly changing the resulting blends. We demonstrate how to use the proposed blending process in sound design and its application in experimental and popular music.
Download A Virtual Analog Model of the Edp Wasp VCF
In this paper we present a virtual analog model of the voltagecontrolled filter used in the EDP Wasp synthesizer. This circuit is an interesting case study for virtual analog modeling due to its characteristic nonlinear and highly dynamic behavior which can be attributed to its unusual design. The Wasp filter consists of a state variable filter topology implemented using operational transconductance amplifiers (OTAs) as the cutoff-control elements and CMOS inverters in lieu of operational amplifiers, all powered by a unipolar power supply. In order to accurately model the behavior of the circuit we propose extended models for its nonlinear components, focusing particularly on the OTAs. The proposed component models are used inside a white-box circuit modeling framework to create a digital simulation of the filter which retains the interesting characteristics of the original device.
Download A Quadric Surface Model of Vacuum Tubes for Virtual Analog Applications
Despite the prevalence of modern audio technology, vacuum tube amplifiers continue to play a vital role in the music industry. For this reason, over the years, many different digital techniques have been introduced for accomplishing their emulation. In this paper, we propose a novel quadric surface model for tube simulations able to overcome the Cardarilli model in terms of efficiency whilst retaining comparable accuracy when grid current is negligible. After showing the model capability to well outline tubes starting from measurement data, we perform an efficiency comparison by implementing the considered tube models as nonlinear 3-port elements in the Wave Digital domain. We do this by taking into account the typical common-cathode gain stage employed in vacuum tube guitar amplifiers. The proposed model turns out to be characterized by a speedup of 4.6× with respect to the Cardarilli model, proving thus to be promising for real-time Virtual Analog applications.
Download Informed Source Separation for Stereo Unmixing — An Open Source Implementation
Active listening consists in interacting with the music playing and has numerous potential applications from pedagogy to gaming, through creation. In the context of music industry, using existing musical recordings (e.g. studio stems), it could be possible for the listener to generate new versions of a given musical piece (i.e. artistic mix). But imagine one could do this from the original mix itself. In a previous research project, we proposed a coder / decoder scheme for what we called informed source separation: The coder determines the information necessary to recover the tracks and embeds it inaudibly (using watermarking) in the mix. The decoder enhances the source separation with this information. We proposed and patented several methods, using various types of embedded information and separation techniques, hoping that the music industry was ready to give the listener this freedom of active listening. Fortunately, there are numerous other applications possible, such as the manipulation of musical archives, for example in the context of ethnomusicology. But the patents remain for many years, which is problematic. In this article, we present an open-source implementation of a patent-free algorithm to address the mixing and unmixing audio problem for any type of music.
Download Fast Differentiable Modal Simulation of Non-Linear Strings, Membranes, and Plates
Modal methods for simulating vibrations of strings, membranes, and plates are widely used in acoustics and physically informed audio synthesis. However, traditional implementations, particularly for non-linear models like the von Kármán plate, are computationally demanding and lack differentiability, limiting inverse modelling and real-time applications. We introduce a fast, differentiable, GPU-accelerated modal framework built with the JAX library, providing efficient simulations and enabling gradientbased inverse modelling. Benchmarks show that our approach significantly outperforms CPU and GPU-based implementations, particularly for simulations with many modes. Inverse modelling experiments demonstrate that our approach can recover physical parameters, including tension, stiffness, and geometry, from both synthetic and experimental data. Although fitting physical parameters is more sensitive to initialisation compared to methods that fit abstract spectral parameters, it provides greater interpretability and more compact parameterisation. The code is released as open source to support future research and applications in differentiable physical modelling and sound synthesis.
Download Improving Lyrics-to-Audio Alignment Using Frame-wise Phoneme Labels with Masked Cross Entropy Loss
This paper addresses the task of lyrics-to-audio alignment, which involves synchronizing textual lyrics with corresponding music audio. Most publicly available datasets for this task provide annotations only at the line or word level. This poses a challenge for training lyrics-to-audio models due to the lack of frame-wise phoneme labels. However, we find that phoneme labels can be partially derived from word-level annotations: for single-phoneme words, all frames corresponding to the word can be labeled with the same phoneme; for multi-phoneme words, phoneme labels can be assigned at the first and last frames of the word. To leverage this partial information, we construct a mask for those frames and propose a masked frame-wise cross-entropy (CE) loss that considers only frames with known phoneme labels. As a baseline model, we adopt an autoencoder trained with a Connectionist Temporal Classification (CTC) loss and a reconstruction loss. We then enhance the training process by incorporating the proposed framewise masked CE loss. Experimental results show that incorporating the frame-wise masked CE loss improves alignment performance. In comparison to other state-of-the art models, our model provides a comparable Mean Absolute Error (MAE) of 0.216 seconds and a top Median Absolute Error (MedAE) of 0.041 seconds on the testing Jamendo dataset.
Download Re-Thinking Sound Separation: Prior Information and Additivity Constraint in Separation Algorithms
In this paper, we study the effect of prior information on the quality of informed source separation algorithms. We present results with our system for solo and accompaniment separation and contrast our findings with two other state-of-the art approaches. Results suggest current separation techniques limit performance when compared to extraction process of prior information. Furthermore, we present an alternative view of the separation process where the additivity constraint of the algorithm is removed in the attempt to maximize obtained quality. Plausible future directions in sound separation research are discussed.