Download A Source-Filter Model for Quasi-Harmonic Instruments
In this paper we propose a new method for a generalized model representing the time-varying spectral characteristics of quasi harmonic instruments. This approach comprises a linear sourcefilter model, a parameter estimation method and a model evaluation based on the prototype’s variance. The source-filter-model is composed of an excitation source generating sinusoidal parameter trajectories and a modeling resonance filter, whereas basic-splines (B-Splines) are used to model continuous trajectories. To estimate the model parameters we apply a gradient decent method to a training database and the prototype’s variance is being estimated on a test database. Such a model could later be used as a priori knowledge for polyphonic instrument recognition, polyphonic transcription and source separation algorithms as well as for resynthesis.
Download Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space
This paper presents a novel approach to neural instrument sound synthesis using a two-stage semi-supervised learning framework capable of generating pitch-accurate, high-quality music samples from an expressive timbre latent space. Existing approaches that achieve sufficient quality for music production often rely on highdimensional latent representations that are difficult to navigate and provide unintuitive user experiences. We address this limitation through a two-stage training paradigm: first, we train a pitchtimbre disentangled 2D representation of audio samples using a Variational Autoencoder; second, we use this representation as conditioning input for a Transformer-based generative model. The learned 2D latent space serves as an intuitive interface for navigating and exploring the sound landscape. We demonstrate that the proposed method effectively learns a disentangled timbre space, enabling expressive and controllable audio generation with reliable pitch conditioning. Experimental results show the model’s ability to capture subtle variations in timbre while maintaining a high degree of pitch accuracy. The usability of our method is demonstrated in an interactive web application, highlighting its potential as a step towards future music production environments that are both intuitive and creatively empowering: https://pgesam.faresschulz.com/.