Download The Mix Evaluation Dataset
Research on perception of music production practices is mainly concerned with the emulation of sound engineering tasks through lab-based experiments and custom software, sometimes with unskilled subjects. This can improve the level of control, but the validity, transferability, and relevance of the results may suffer from this artificial context. This paper presents a dataset consisting of mixes gathered in a real-life, ecologically valid setting, and perceptual evaluation thereof, which can be used to expand knowledge on the mixing process. With 180 mixes including parameter settings, close to 5000 preference ratings and free-form descriptions, and a diverse range of contributors from five different countries, the data offers many opportunities for music production analysis, some of which are explored here. In particular, more experienced subjects were found to be more negative and more specific in their assessments of mixes, and to increasingly agree with each other.
Download Analytical Features for the Classification of Percussive Sounds: The Case of the Pandeiro
There is an increasing need for automatically classifying sounds for MIR and interactive music applications. In the context of supervised classification, we describe an approach that improves the performance of the general bag-of-frame scheme without loosing its generality. This method is based on the construction and exploitation of specific audio features, called analytical, as input to classifiers. These features are better, in a sense we define precisely than standard, general features, or even than ad hoc features designed by hand for specific problems. To construct these features, our method explores a very large space of functions, by composing basic operators in syntactically correct ways. These operators are taken from the Mathematical and Audio Processing domains. Our method allows us to build a large number of these features, evaluate and select them automatically for arbitrary audio classification problems. We present here a specific study concerning the analysis of Pandeiro (Brazilian tambourine) sounds. Two problems are considered: the classification of entire sounds, for MIR applications, and the classification of attacks portions of the sound only, for interactive music applications. We evaluate precisely the gain obtained by analytical features on these two problems, in comparison with standard approaches.
Download A Generative Model for Raw Audio Using Transformer Architectures
This paper proposes a novel way of doing audio synthesis at the waveform level using Transformer architectures. We propose a deep neural network for generating waveforms, similar to wavenet . This is fully probabilistic, auto-regressive, and causal, i.e. each sample generated depends on only the previously observed samples. Our approach outperforms a widely used wavenet architecture by up to 9% on a similar dataset for predicting the next step. Using the attention mechanism, we enable the architecture to learn which audio samples are important for the prediction of the future sample. We show how causal transformer generative models can be used for raw waveform synthesis. We also show that this performance can be improved by another 2% by conditioning samples over a wider context. The flexibility of the current model to synthesize audio from latent representations suggests a large number of potential applications. The novel approach of using generative transformer architectures for raw audio synthesis is, however, still far away from generating any meaningful music similar to wavenet, without using latent codes/meta-data to aid the generation process.
Download A Statistics-Driven Differentiable Approach for Sound Texture Synthesis and Analysis
In this work, we introduce TexStat, a novel loss function specifically designed for the analysis and synthesis of texture sounds characterized by stochastic structure and perceptual stationarity. Drawing inspiration from the statistical and perceptual framework of McDermott and Simoncelli, TexStat identifies similarities between signals belonging to the same texture category without relying on temporal structure. We also propose using TexStat as a validation metric alongside Frechet Audio Distances (FAD) to evaluate texture sound synthesis models. In addition to TexStat, we present TexEnv, an efficient, lightweight and differentiable texture sound synthesizer that generates audio by imposing amplitude envelopes on filtered noise. We further integrate these components into TexDSP, a DDSP-inspired generative model tailored for texture sounds. Through extensive experiments across various texture sound types, we demonstrate that TexStat is perceptually meaningful, time-invariant, and robust to noise, features that make it effective both as a loss function for generative tasks and as a validation metric. All tools and code are provided as open-source contributions and our PyTorch implementations are efficient, differentiable, and highly configurable, enabling its use in both generative tasks and as a perceptually grounded evaluation metric.
Download Modeling Time-Varying Reactances using Wave Digital Filters
Wave Digital Filters were developed to discretize linear time invariant lumped systems, particularly electronic circuits. The timeinvariant assumption is baked into the underlying theory and becomes problematic when simulating audio circuits that are by nature time-varying. We present extensions to WDF theory that incorporate proper numerical schemes, allowing for the accurate simulation of time-varying systems. We present generalized continuous-time models of reactive components that encapsulate the time-varying lossless models presented by Fettweis, the circuit-theoretic time-varying models, as well as traditional LTI models as special cases. Models of timevarying reactive components are valuable tools to have when modeling circuits containing variable capacitors or inductors or electrical devices such as condenser microphones. A power metric is derived and the model is discretized using the alpha-transform numerical scheme and parametric wave definition. Case studies of circuits containing time-varying resistance and capacitance are presented and help to validate the proposed generalized continuous-time model and discretization.
Download Real-Time 3D Finite-Difference Time-Domain Simulation of Low- and Mid-Frequency Room Acoustics
Modern graphics processing units (GPUs) are massively parallel computing environments. They make it possible to run certain tasks orders of magnitude faster than what is possible with a central processing unit (CPU). One such case is simulation of room acoustics with wave-based modeling techniques. In this paper we show that it is possible to run room acoustic simulations with a finite-difference time-domain model in real-time for a modest-size geometry up to 7kHz sampling rate. For a 10% maximum dispersion error limit this means that our system can be used for realtime auralization up to 1.5kHz. In addition, the system is able to handle several simultaneous sound sources and a moving listener with no additional cost. The results of this study include performance comparison of different schemes showing that the interpolated wideband scheme is able to handle in real-time 1.4 times the bandwidth of the standard rectilinear scheme with the same maximum dispersion error.
Download Sinusoid Modeling in a Harmonic Context
This article discusses harmonic sinusoid modeling. Unlike standard sinusoid analyzers, the harmonic sinusoid analyzer keeps close watch on partial harmony from an early stage of modeling, therefore guarantees the harmonic relationship among the sinusoids. The key element in harmonic sinusoid modeling is the harmonic sinusoid particle, which can be found by grouping short-time sinusoids. Instead of tracking short-time sinusoids, the harmonic tracker operates on harmonic particles directly. To express harmonic partial frequencies in a compact and robust form, we have developed an inequality-based representation with adjustable tolerance on frequency errors and inharmonicity, which is used in both the grouping and tracking stages. Frequency and amplitude continuity criteria are considered for tracking purpose. Numerical simulations are performed on simple synthesized signals.
Download Reservoir Computing: a powerful Framework for Nonlinear Audio Processing
This paper proposes reservoir computing as a general framework for nonlinear audio processing. Reservoir computing is a novel approach to recurrent neural network training with the advantage of a very simple and linear learning algorithm. It can in theory approximate arbitrary nonlinear dynamical systems with arbitrary precision, has an inherent temporal processing capability and is therefore well suited for many nonlinear audio processing problems. Always when nonlinear relationships are present in the data and time information is crucial, reservoir computing can be applied. Examples from three application areas are presented: nonlinear system identification of a tube amplifier emulator algorithm, nonlinear audio prediction, as necessary in a wireless transmission of audio where dropouts may occur, and automatic melody transcription out of a polyphonic audio stream, as one example from the big field of music information retrieval. Reservoir computing was able to outperform state-of-the-art alternative models in all studied tasks.
Download Conformal Maps for the Discretization of Analog Filters Near the Nyquist Limit
We propose a new analog filter discretization method that is useful for discretizing systems with features near or above the Nyquist limit. A conformal mapping approach is taken, and we introduce the peaking conformal map and shelving conformal map. The proposed method provides a close match to the original analog frequency response below half the sampling rate and is parameterizable, order preserving, and agnostic to the original filter’s order or type. The proposed method should have applications to discretizing filters that have time-varying parameters or need to be implemented across many different sampling rates.
Download On the Equivalence of Integrator- and Differentiator-Based Continuous- and Discrete-Time Systems
The article performs a generic comparison of integrator- and differentiator based continuous-time systems as well as their discretetime models, aiming to answer the reoccurring question in the music DSP community of whether there are any benefits in using differentiators instead of conventionally employed integrators. It is found that both kinds of models are practically equivalent, but there are certain reservations about differentiator based models.