Download The Mix Evaluation Dataset Research on perception of music production practices is mainly concerned with the emulation of sound engineering tasks through lab-based experiments and custom software, sometimes with unskilled subjects. This can improve the level of control, but the validity, transferability, and relevance of the results may suffer from this artificial context. This paper presents a dataset consisting of mixes gathered in a real-life, ecologically valid setting, and perceptual evaluation thereof, which can be used to expand knowledge on the mixing process. With 180 mixes including parameter settings, close to 5000 preference ratings and free-form descriptions, and a diverse range of contributors from five different countries, the data offers many opportunities for music production analysis, some of which are explored here. In particular, more experienced subjects were found to be more negative and more specific in their assessments of mixes, and to increasingly agree with each other.
Download Analytical Features for the Classification of Percussive Sounds: The Case of the Pandeiro There is an increasing need for automatically classifying sounds for MIR and interactive music applications. In the context of supervised classification, we describe an approach that improves the performance of the general bag-of-frame scheme without loosing its generality. This method is based on the construction and exploitation of specific audio features, called analytical, as input to classifiers. These features are better, in a sense we define precisely than standard, general features, or even than ad hoc features designed by hand for specific problems. To construct these features, our method explores a very large space of functions, by composing basic operators in syntactically correct ways. These operators are taken from the Mathematical and Audio Processing domains. Our method allows us to build a large number of these features, evaluate and select them automatically for arbitrary audio classification problems. We present here a specific study concerning the analysis of Pandeiro (Brazilian tambourine) sounds. Two problems are considered: the classification of entire sounds, for MIR applications, and the classification of attacks portions of the sound only, for interactive music applications. We evaluate precisely the gain obtained by analytical features on these two problems, in comparison with standard approaches.
Download A Generative Model for Raw Audio Using Transformer Architectures This paper proposes a novel way of doing audio synthesis at the
waveform level using Transformer architectures. We propose a
deep neural network for generating waveforms, similar to wavenet . This is fully probabilistic, auto-regressive, and causal, i.e.
each sample generated depends on only the previously observed
samples. Our approach outperforms a widely used wavenet architecture by up to 9% on a similar dataset for predicting the next
step. Using the attention mechanism, we enable the architecture
to learn which audio samples are important for the prediction of
the future sample. We show how causal transformer generative
models can be used for raw waveform synthesis. We also show
that this performance can be improved by another 2% by conditioning samples over a wider context. The flexibility of the current
model to synthesize audio from latent representations suggests a
large number of potential applications. The novel approach of using generative transformer architectures for raw audio synthesis
is, however, still far away from generating any meaningful music
similar to wavenet, without using latent codes/meta-data to aid the
generation process.
Download A Statistics-Driven Differentiable Approach for Sound Texture Synthesis and Analysis In this work, we introduce TexStat, a novel loss function specifically designed for the analysis and synthesis of texture sounds
characterized by stochastic structure and perceptual stationarity.
Drawing inspiration from the statistical and perceptual framework
of McDermott and Simoncelli, TexStat identifies similarities
between signals belonging to the same texture category without
relying on temporal structure. We also propose using TexStat
as a validation metric alongside Frechet Audio Distances (FAD) to
evaluate texture sound synthesis models. In addition to TexStat,
we present TexEnv, an efficient, lightweight and differentiable
texture sound synthesizer that generates audio by imposing amplitude envelopes on filtered noise. We further integrate these components into TexDSP, a DDSP-inspired generative model tailored
for texture sounds. Through extensive experiments across various
texture sound types, we demonstrate that TexStat is perceptually meaningful, time-invariant, and robust to noise, features that
make it effective both as a loss function for generative tasks and as
a validation metric. All tools and code are provided as open-source
contributions and our PyTorch implementations are efficient, differentiable, and highly configurable, enabling its use in both generative tasks and as a perceptually grounded evaluation metric.
Download Modeling Time-Varying Reactances using Wave Digital Filters Wave Digital Filters were developed to discretize linear time invariant lumped systems, particularly electronic circuits. The timeinvariant assumption is baked into the underlying theory and becomes problematic when simulating audio circuits that are by nature time-varying. We present extensions to WDF theory that incorporate proper numerical schemes, allowing for the accurate simulation of time-varying systems. We present generalized continuous-time models of reactive components that encapsulate the time-varying lossless models presented by Fettweis, the circuit-theoretic time-varying models, as well as traditional LTI models as special cases. Models of timevarying reactive components are valuable tools to have when modeling circuits containing variable capacitors or inductors or electrical devices such as condenser microphones. A power metric is derived and the model is discretized using the alpha-transform numerical scheme and parametric wave definition. Case studies of circuits containing time-varying resistance and capacitance are presented and help to validate the proposed generalized continuous-time model and discretization.
Download Real-Time 3D Finite-Difference Time-Domain Simulation of Low- and Mid-Frequency Room Acoustics Modern graphics processing units (GPUs) are massively parallel computing environments. They make it possible to run certain tasks orders of magnitude faster than what is possible with a central processing unit (CPU). One such case is simulation of room acoustics with wave-based modeling techniques. In this paper we show that it is possible to run room acoustic simulations with a finite-difference time-domain model in real-time for a modest-size geometry up to 7kHz sampling rate. For a 10% maximum dispersion error limit this means that our system can be used for realtime auralization up to 1.5kHz. In addition, the system is able to handle several simultaneous sound sources and a moving listener with no additional cost. The results of this study include performance comparison of different schemes showing that the interpolated wideband scheme is able to handle in real-time 1.4 times the bandwidth of the standard rectilinear scheme with the same maximum dispersion error.
Download Sinusoid Modeling in a Harmonic Context This article discusses harmonic sinusoid modeling. Unlike standard sinusoid analyzers, the harmonic sinusoid analyzer keeps close watch on partial harmony from an early stage of modeling, therefore guarantees the harmonic relationship among the sinusoids. The key element in harmonic sinusoid modeling is the harmonic sinusoid particle, which can be found by grouping short-time sinusoids. Instead of tracking short-time sinusoids, the harmonic tracker operates on harmonic particles directly. To express harmonic partial frequencies in a compact and robust form, we have developed an inequality-based representation with adjustable tolerance on frequency errors and inharmonicity, which is used in both the grouping and tracking stages. Frequency and amplitude continuity criteria are considered for tracking purpose. Numerical simulations are performed on simple synthesized signals.
Download Reservoir Computing: a powerful Framework for Nonlinear Audio Processing This paper proposes reservoir computing as a general framework for nonlinear audio processing. Reservoir computing is a novel approach to recurrent neural network training with the advantage of a very simple and linear learning algorithm. It can in theory approximate arbitrary nonlinear dynamical systems with arbitrary precision, has an inherent temporal processing capability and is therefore well suited for many nonlinear audio processing problems. Always when nonlinear relationships are present in the data and time information is crucial, reservoir computing can be applied. Examples from three application areas are presented: nonlinear system identification of a tube amplifier emulator algorithm, nonlinear audio prediction, as necessary in a wireless transmission of audio where dropouts may occur, and automatic melody transcription out of a polyphonic audio stream, as one example from the big field of music information retrieval. Reservoir computing was able to outperform state-of-the-art alternative models in all studied tasks.
Download Conformal Maps for the Discretization of Analog Filters Near the Nyquist Limit We propose a new analog filter discretization method that is useful
for discretizing systems with features near or above the Nyquist
limit. A conformal mapping approach is taken, and we introduce
the peaking conformal map and shelving conformal map. The proposed method provides a close match to the original analog frequency response below half the sampling rate and is parameterizable, order preserving, and agnostic to the original filter’s order
or type. The proposed method should have applications to discretizing filters that have time-varying parameters or need to be
implemented across many different sampling rates.
Download On the Equivalence of Integrator- and Differentiator-Based Continuous- and Discrete-Time Systems The article performs a generic comparison of integrator- and differentiator based continuous-time systems as well as their discretetime models, aiming to answer the reoccurring question in the
music DSP community of whether there are any benefits in using differentiators instead of conventionally employed integrators.
It is found that both kinds of models are practically equivalent, but
there are certain reservations about differentiator based models.