Download Antialiased Black-Box Modeling of Audio Distortion Circuits Using Real Linear Recurrent Units In this paper, we propose the use of real-valued Linear Recurrent
Units (LRUs) for black-box modeling of audio circuits. A network architecture composed of real LRU blocks interleaved with
nonlinear processing stages is proposed.
Two case studies are
presented, a second-order diode clipper and an overdrive distortion pedal. Furthermore, we show how to integrate the antiderivative antialiaisng technique into the proposed method, effectively
lowering oversampling requirements. Our experiments show that
the proposed method generates models that accurately capture the
nonlinear dynamics of the examined devices and are highly efficient, which makes them suitable for real-time operation inside
Digital Audio Workstations.
Download Learning Nonlinear Dynamics in Physical Modelling Synthesis Using Neural Ordinary Differential Equations Modal synthesis methods are a long-standing approach for modelling distributed musical systems. In some cases extensions are
possible in order to handle geometric nonlinearities. One such
case is the high-amplitude vibration of a string, where geometric nonlinear effects lead to perceptually important effects including pitch glides and a dependence of brightness on striking amplitude. A modal decomposition leads to a coupled nonlinear system of ordinary differential equations. Recent work in applied machine learning approaches (in particular neural ordinary differential equations) has been used to model lumped dynamic systems
such as electronic circuits automatically from data. In this work,
we examine how modal decomposition can be combined with neural ordinary differential equations for modelling distributed musical systems. The proposed model leverages the analytical solution
for linear vibration of system’s modes and employs a neural network to account for nonlinear dynamic behaviour. Physical parameters of a system remain easily accessible after the training without
the need for a parameter encoder in the network architecture. As
an initial proof of concept, we generate synthetic data for a nonlinear transverse string and show that the model can be trained to
reproduce the nonlinear dynamics of the system. Sound examples
are presented.
Download Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space This paper presents a novel approach to neural instrument sound
synthesis using a two-stage semi-supervised learning framework
capable of generating pitch-accurate, high-quality music samples
from an expressive timbre latent space. Existing approaches that
achieve sufficient quality for music production often rely on highdimensional latent representations that are difficult to navigate and
provide unintuitive user experiences. We address this limitation
through a two-stage training paradigm: first, we train a pitchtimbre disentangled 2D representation of audio samples using a
Variational Autoencoder; second, we use this representation as
conditioning input for a Transformer-based generative model. The
learned 2D latent space serves as an intuitive interface for navigating and exploring the sound landscape. We demonstrate that the
proposed method effectively learns a disentangled timbre space,
enabling expressive and controllable audio generation with reliable
pitch conditioning. Experimental results show the model’s ability to capture subtle variations in timbre while maintaining a high
degree of pitch accuracy. The usability of our method is demonstrated in an interactive web application, highlighting its potential
as a step towards future music production environments that are
both intuitive and creatively empowering:
https://pgesam.faresschulz.com/.
Download Audio Processor Parameters: Estimating Distributions Instead of Deterministic Values Audio effects and sound synthesizers are widely used processors
in popular music.
Their parameters control the quality of the
output sound. Multiple combinations of parameters can lead to
the same sound.
While recent approaches have been proposed
to estimate these parameters given only the output sound, those
are deterministic, i.e. they only estimate a single solution among
the many possible parameter configurations.
In this work, we
propose to model the parameters as probability distributions instead
of deterministic values. To learn the distributions, we optimize
two objectives: (1) we minimize the reconstruction error between
the ground truth output sound and the one generated using the
estimated parameters, asisit usuallydone, but also(2)we maximize
the parameter diversity, using entropy. We evaluate our approach
through two numerical audio experiments to show its effectiveness.
These results show how our approach effectively outputs multiple
combinations of parameters to match one sound.
Download A Statistics-Driven Differentiable Approach for Sound Texture Synthesis and Analysis In this work, we introduce TexStat, a novel loss function specifically designed for the analysis and synthesis of texture sounds
characterized by stochastic structure and perceptual stationarity.
Drawing inspiration from the statistical and perceptual framework
of McDermott and Simoncelli, TexStat identifies similarities
between signals belonging to the same texture category without
relying on temporal structure. We also propose using TexStat
as a validation metric alongside Frechet Audio Distances (FAD) to
evaluate texture sound synthesis models. In addition to TexStat,
we present TexEnv, an efficient, lightweight and differentiable
texture sound synthesizer that generates audio by imposing amplitude envelopes on filtered noise. We further integrate these components into TexDSP, a DDSP-inspired generative model tailored
for texture sounds. Through extensive experiments across various
texture sound types, we demonstrate that TexStat is perceptually meaningful, time-invariant, and robust to noise, features that
make it effective both as a loss function for generative tasks and as
a validation metric. All tools and code are provided as open-source
contributions and our PyTorch implementations are efficient, differentiable, and highly configurable, enabling its use in both generative tasks and as a perceptually grounded evaluation metric.
Download Training Neural Models of Nonlinear Multi-Port Elements Within Wave Digital Structures Through Discrete-Time Simulation Neural networks have been applied within the Wave Digital Filter
(WDF) framework as data-driven models for nonlinear multi-port
circuit elements. Conventionally, these models are trained on wave
variables obtained by sampling the current-voltage characteristic
of the considered nonlinear element before being incorporated into
the circuit WDF implementation. However, isolating multi-port
elements for this process can be challenging, as their nonlinear
behavior often depends on dynamic effects that emerge from interactions with the surrounding circuit. In this paper, we propose a
novel approach for training neural models of nonlinear multi-port
elements directly within a circuit’s Wave Digital (WD) discretetime implementation, relying solely on circuit input-output voltage
measurements. Exploiting the differentiability of WD simulations,
we embed the neural network into the simulation process and optimize its parameters using gradient-based methods by minimizing
a loss function defined over the circuit output voltage. Experimental results demonstrate the effectiveness of the proposed approach
in accurately capturing the nonlinear circuit behavior, while preserving the interpretability and modularity of WDFs.
Download Automatic Classification of Chains of Guitar Effects Through Evolutionary Neural Architecture Search Recent studies on classifying electric guitar effects have achieved
high accuracy, particularly with deep learning techniques. However, these studies often rely on simplified datasets consisting
mainly of single notes rather than realistic guitar recordings.
Moreover, in the specific field of effect chain estimation, the literature tends to rely on large models, making them impractical for
real-time or resource-constrained applications. In this work, we
recorded realistic guitar performances using four different guitars
and created three datasets by applying a chain of five effects with
increasing complexity: (1) fixed order and parameters, (2) fixed order with randomly sampled parameters, and (3) random order and
parameters. We also propose a novel Neural Architecture Search
method aimed at discovering accurate yet compact convolutional
neural network models to reduce power and memory consumption.
We compared its performance to a basic random search strategy,
showing that our custom Neural Architecture Search outperformed
random search in identifying models that balance accuracy and
complexity. We found that the number of convolutional and pooling layers becomes increasingly important as dataset complexity
grows, while dense layers have less impact. Additionally, among
the effects, tremolo was identified as the most challenging to classify.
Download Unsupervised Text-to-Sound Mapping via Embedding Space Alignment This work focuses on developing an artistic tool that performs an
unsupervised mapping between text and sound, converting an input text string into a series of sounds from a given sound corpus.
With the use of a pre-trained sound embedding model and a separate, pre-trained text embedding model, the goal is to find a mapping between the two feature spaces. Our approach is unsupervised which allows any sound corpus to be used with the system.
The tool performs the task of text-to-sound retrieval, creating a
soundfile in which each word in the text input is mapped to a single sound in the corpus, and the resulting sounds are concatenated
to play sequentially. We experiment with three different mapping
methods, and perform quantitative and qualitative evaluations on
the outputs. Our results demonstrate the potential of unsupervised
methods for creative applications in text-to-sound mapping.
Download Anti-Aliasing of Neural Distortion Effects via Model Fine Tuning Neural networks have become ubiquitous with guitar distortion
effects modelling in recent years. Despite their ability to yield
perceptually convincing models, they are susceptible to frequency
aliasing when driven by high frequency and high gain inputs.
Nonlinear activation functions create both the desired harmonic
distortion and unwanted aliasing distortion as the bandwidth of
the signal is expanded beyond the Nyquist frequency. Here, we
present a method for reducing aliasing in neural models via a
teacher-student fine tuning approach, where the teacher is a pretrained model with its weights frozen, and the student is a copy of
this with learnable parameters. The student is fine-tuned against
an aliasing-free dataset generated by passing sinusoids through
the original model and removing non-harmonic components from
the output spectra.
Our results show that this method significantly suppresses aliasing for both long-short-term-memory networks (LSTM) and temporal convolutional networks (TCN). In the
majority of our case studies, the reduction in aliasing was greater
than that achieved by two times oversampling. One side-effect
of the proposed method is that harmonic distortion components
are also affected.
This adverse effect was found to be modeldependent, with the LSTM models giving the best balance between
anti-aliasing and preserving the perceived similarity to an analog
reference device.
Download Antiderivative Antialiasing for Recurrent Neural Networks Neural networks have become invaluable for general audio processing tasks, such as virtual analog modeling of nonlinear audio equipment.
For sequence modeling tasks in particular, recurrent neural networks (RNNs) have gained widespread adoption in recent years. Their general applicability and effectiveness
stems partly from their inherent nonlinearity, which makes them
prone to aliasing. Recent work has explored mitigating aliasing
by oversampling the network—an approach whose effectiveness is
directly linked with the incurred computational costs. This work
explores an alternative route by extending the antiderivative antialiasing technique to explicit, computable RNNs. Detailed applications to the Gated Recurrent Unit and Long Short-Term Memory cell are shown as case studies. The proposed technique is evaluated
on multiple pre-trained guitar amplifier models, assessing its impact on the amount of aliasing and model tonality. The method is
shown to reduce the models’ tendency to alias considerably across
all considered sample rates while only affecting their tonality moderately, without requiring high oversampling factors. The results
of this study can be used to improve sound quality in neural audio
processing tasks that employ a suitable class of RNNs. Additional
materials are provided in the accompanying webpage.