Download Pywdf: An Open Source Library for Prototyping and Simulating Wave Digital Filter Circuits in Python This paper introduces a new open-source Python library for the modeling and simulation of wave digital filter (WDF) circuits. The library, called pwydf, allows users to easily create and analyze WDF circuit models in a high-level, object-oriented manner. The library includes a variety of built-in components, such as voltage sources, capacitors, diodes etc., as well as the ability to create custom components and circuits. Additionally, pywdf includes a variety of analysis tools, such as frequency response and transient analysis, to aid in the design and optimization of WDF circuits. We demonstrate the library’s efficacy in replicating the nonlinear behavior of an analog diode clipper circuit, and in creating an allpass filter that cannot be realized in the analog world. The library is well-documented and includes several examples to help users get started. Overall, pywdf is a powerful tool for anyone working with WDF circuits, and we hope it can be of great use to researchers and engineers in the field.
Download Alloy Sounds: Non-Repeating Sound Textures With Probabilistic Cellular Automata Contemporary musicians commonly face the challenge of finding
new, characteristic sounds that can make their compositions more
distinct. They often resort to computers and algorithms, which can
significantly aid in creative processes by generating unexpected
material in controlled probabilistic processes. In particular, algorithms that present emergent behaviors, like genetic algorithms
and cellular automata, have fostered a broad diversity of musical explorations. This article proposes an original technique for
the computer-assisted creation and manipulation of sound textures.
The technique uses Probabilistic Cellular Automata, which are yet
seldom explored in the music domain, to blend two audio tracks
into a third, different one. The proposed blending process works
by dividing the source tracks into frequency bands and then associating each of the automaton’s cell to a frequency band. Only one
source, chosen by the cell’s state, is active within each band. The
resulting track has a non-repeating textural pattern that follows the
changes in the Cellular Automata. This blending process allows
the musician to choose the original material and the blend granularity, significantly changing the resulting blends. We demonstrate
how to use the proposed blending process in sound design and its
application in experimental and popular music.
Download Production Effect: Audio Features for Recording Techniques Description and Decade Prediction In this paper we address the problem of the description of music production techniques from the audio signal. Over the past decades sound engineering techniques have changed drastically. New recording technologies, extensive use of compressors and limiters or new stereo techniques have deeply modified the sound of records. We propose three features to describe these evolutions in music production. They are based on the dynamic range of the signal, energy difference between channels and phase spread between channels. We measure the relevance of these features on a task of automatic classification of Pop/Rock songs into decades. In the context of Music Information Retrieval this kind of description could be very useful to better describe the content of a song or to assess the similarity between songs.
Download Speech/music discrimination based on a new warped LPC-based feature and linear discriminant analysis Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents a low complexity but effective approach, which exploits only one simple feature, called Warped LPC-based Spectral Centroid (WLPCSC). Comparison between WLPC-SC and the classical features proposed in [9] is performed, aiming to assess the good discriminatory power of the proposed feature. The length of the vector for describing the proposed psychoacoustic based feature is reduced to a few statistical values (mean, variance and skewness), which are then transformed to a new feature space by applying LDA with the aim of increasing the classification accuracy percentage. The classification task is performed by applying SVM to the features in the transformed space. The classification results for different types of music and speech show the good discriminating power of the proposed approach.
Download Optimization techniques for a physical model of human vocalisation We present a non-supervised approach to optimize and evaluate the synthesis of non-speech audio effects from a speech production model. We use the Pink Trombone synthesizer as a case study of a simplified production model of the vocal tract to target nonspeech human audio signals –yawnings. We selected and optimized the control parameters of the synthesizer to minimize the difference between real and generated audio. We validated the most common optimization techniques reported in the literature and a specifically designed neural network. We evaluated several popular quality metrics as error functions. These include both objective quality metrics and subjective-equivalent metrics. We compared the results in terms of total error and computational demand. Results show that genetic and swarm optimizers outperform least squares algorithms at the cost of executing slower and that specific combinations of optimizers and audio representations offer significantly different results. The proposed methodology could be used in benchmarking other physical models and audio types.
Download Automatic Classification of Chains of Guitar Effects Through Evolutionary Neural Architecture Search Recent studies on classifying electric guitar effects have achieved
high accuracy, particularly with deep learning techniques. However, these studies often rely on simplified datasets consisting
mainly of single notes rather than realistic guitar recordings.
Moreover, in the specific field of effect chain estimation, the literature tends to rely on large models, making them impractical for
real-time or resource-constrained applications. In this work, we
recorded realistic guitar performances using four different guitars
and created three datasets by applying a chain of five effects with
increasing complexity: (1) fixed order and parameters, (2) fixed order with randomly sampled parameters, and (3) random order and
parameters. We also propose a novel Neural Architecture Search
method aimed at discovering accurate yet compact convolutional
neural network models to reduce power and memory consumption.
We compared its performance to a basic random search strategy,
showing that our custom Neural Architecture Search outperformed
random search in identifying models that balance accuracy and
complexity. We found that the number of convolutional and pooling layers becomes increasingly important as dataset complexity
grows, while dense layers have less impact. Additionally, among
the effects, tremolo was identified as the most challenging to classify.
Download Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space This paper presents a novel approach to neural instrument sound
synthesis using a two-stage semi-supervised learning framework
capable of generating pitch-accurate, high-quality music samples
from an expressive timbre latent space. Existing approaches that
achieve sufficient quality for music production often rely on highdimensional latent representations that are difficult to navigate and
provide unintuitive user experiences. We address this limitation
through a two-stage training paradigm: first, we train a pitchtimbre disentangled 2D representation of audio samples using a
Variational Autoencoder; second, we use this representation as
conditioning input for a Transformer-based generative model. The
learned 2D latent space serves as an intuitive interface for navigating and exploring the sound landscape. We demonstrate that the
proposed method effectively learns a disentangled timbre space,
enabling expressive and controllable audio generation with reliable
pitch conditioning. Experimental results show the model’s ability to capture subtle variations in timbre while maintaining a high
degree of pitch accuracy. The usability of our method is demonstrated in an interactive web application, highlighting its potential
as a step towards future music production environments that are
both intuitive and creatively empowering:
https://pgesam.faresschulz.com/.
Download Sample Rate Independent Recurrent Neural Networks for Audio Effects Processing In recent years, machine learning approaches to modelling guitar amplifiers and effects pedals have been widely investigated and have become standard practice in some consumer products. In particular, recurrent neural networks (RNNs) are a popular choice for modelling non-linear devices such as vacuum tube amplifiers and distortion circuitry. One limitation of such models is that they are trained on audio at a specific sample rate and therefore give unreliable results when operating at another rate. Here, we investigate several methods of modifying RNN structures to make them approximately sample rate independent, with a focus on oversampling. In the case of integer oversampling, we demonstrate that a previously proposed delay-based approach provides high fidelity sample rate conversion whilst additionally reducing aliasing. For non-integer sample rate adjustment, we propose two novel methods and show that one of these, based on cubic Lagrange interpolation of a delay-line, provides a significant improvement over existing methods. To our knowledge, this work provides the first in-depth study into this problem.
Download Damped Chirp Mixture Estimation via Nonlinear Bayesian Regression Estimating mixtures of damped chirp sinusoids in noise is a
problem that affects audio analysis, coding, and synthesis applications. Phase-based non-stationary parameter estimators assume
that sinusoids can be resolved in the Fourier transform domain,
whereas high-resolution methods estimate superimposed components with accuracy close to the theoretical limits, but only for
sinusoids with constant frequencies. We present a new method
for estimating the parameters of superimposed damped chirps that
has an accuracy competitive with existing non-stationary estimators but also has a high-resolution like subspace techniques. After providing the analytical expression for a Gaussian-windowed
damped chirp signal’s Fourier transform, we propose an efficient
variational EM algorithm for nonlinear Bayesian regression that
jointly estimates the amplitudes, phases, frequencies, chirp rates,
and decay rates of multiple non-stationary components that may be
obfuscated under the same local maximum in the frequency spectrum. Quantitative results show that the new method not only has
an estimation accuracy that is close to the Cramér-Rao bound, but
also a high resolution that outperforms the state-of-the-art.
Download A general-purpose deep learning approach to model time-varying audio effects Audio processors whose parameters are modified periodically over time are often referred as time-varying or modulation based audio effects. Most existing methods for modeling these type of effect units are often optimized to a very specific circuit and cannot be efficiently generalized to other time-varying effects. Based on convolutional and recurrent neural networks, we propose a deep learning architecture for generic black-box modeling of audio processors with long-term memory. We explore the capabilities of deep neural networks to learn such long temporal dependencies and we show the network modeling various linear and nonlinear, time-varying and time-invariant audio effects. In order to measure the performance of the model, we propose an objective metric based on the psychoacoustics of modulation frequency perception. We also analyze what the model is actually learning and how the given task is accomplished.