Download Data Augmentation for Instrument Classification Robust to Audio Effects Reusing recorded sounds (sampling) is a key component in Electronic Music Production (EMP), which has been present since its early days and is at the core of genres like hip-hop or jungle. Commercial and non-commercial services allow users to obtain collections of sounds (sample packs) to reuse in their compositions. Automatic classification of one-shot instrumental sounds allows automatically categorising the sounds contained in these collections, allowing easier navigation and better characterisation. Automatic instrument classification has mostly targeted the classification of unprocessed isolated instrumental sounds or detecting predominant instruments in mixed music tracks. For this classification to be useful in audio databases for EMP, it has to be robust to the audio effects applied to unprocessed sounds. In this paper we evaluate how a state of the art model trained with a large dataset of one-shot instrumental sounds performs when classifying instruments processed with audio effects. In order to evaluate the robustness of the model, we use data augmentation with audio effects and evaluate how each effect influences the classification accuracy.
Download A Study of Control Methods for Percussive Sound Synthesis Based on Gans The process of creating drum sounds has seen significant evolution in the past decades. The development of analogue drum synthesizers, such as the TR-808, and modern sound design tools in Digital Audio Workstations led to a variety of drum timbres that defined entire musical genres. Recently, drum synthesis research has been revived with a new focus on training generative neural networks to create drum sounds. Different interfaces have previously been proposed to control the generative process, from low-level latent space navigation to high-level semantic feature parameterisation, but no comprehensive analysis has been presented to evaluate how each approach relates to the creative process. We aim to evaluate how different interfaces support creative control over drum generation by conducting a user study based on the Creative Support Index. We experiment with both a supervised method that decodes semantic latent space directions and an unsupervised Closed-Form Factorization approach from computer vision literature to parameterise the generation process and demonstrate that the latter is the preferred means to control a drum synthesizer based on the StyleGAN2 network architecture.
Download A Statistics-Driven Differentiable Approach for Sound Texture Synthesis and Analysis In this work, we introduce TexStat, a novel loss function specifically designed for the analysis and synthesis of texture sounds
characterized by stochastic structure and perceptual stationarity.
Drawing inspiration from the statistical and perceptual framework
of McDermott and Simoncelli, TexStat identifies similarities
between signals belonging to the same texture category without
relying on temporal structure. We also propose using TexStat
as a validation metric alongside Frechet Audio Distances (FAD) to
evaluate texture sound synthesis models. In addition to TexStat,
we present TexEnv, an efficient, lightweight and differentiable
texture sound synthesizer that generates audio by imposing amplitude envelopes on filtered noise. We further integrate these components into TexDSP, a DDSP-inspired generative model tailored
for texture sounds. Through extensive experiments across various
texture sound types, we demonstrate that TexStat is perceptually meaningful, time-invariant, and robust to noise, features that
make it effective both as a loss function for generative tasks and as
a validation metric. All tools and code are provided as open-source
contributions and our PyTorch implementations are efficient, differentiable, and highly configurable, enabling its use in both generative tasks and as a perceptually grounded evaluation metric.
Download Analysis of Musical Dynamics in Vocal Performances Using Loudness Measures In addition to tone, pitch and rhythm, dynamics is one of the expressive dimensions of the performance of a music piece that has received limited attention. While the usage of dynamics may vary from artist to artist, and also from performance to performance, a systematic methodology to automatically identify the dynamics of a performance in terms of musically meaningful terms like forte, piano may offer valuable feedback in the context of music education and in particular in singing. To this end, we have manually annotated the dynamic markings of commercial recordings of popular rock and pop songs from the Smule Vocal Balanced (SVB) dataset which will be used as reference data. Then as a first step for our research goal, we propose a method to derive and compare singing voice loudness curves in polyphonic mixtures. Towards measuring the similarity and variation of dynamics, we compare the dynamics curves of the SVB renditions with the one derived from the original songs. We perform the same comparison using professionally produced renditions from a karaoke website. We relate high values of Spearman correlation coefficient found in some select student renditions and the professional renditions with accurate dynamics.
Download Improved Automatic Instrumentation Role Classification and Loop Activation Transcription Many electronic music (EM) genres are composed through the activation of short audio recordings of instruments designed for seamless repetition—or loops. In this work, loops of key structural groups such as bass, percussive or melodic elements are labelled by the role they occupy in a piece of music through the task of automatic instrumentation role classification (AIRC). Such labels assist EM producers in the identification of compatible loops in large unstructured audio databases. While human annotation is often laborious, automatic classification allows for fast and scalable generation of these labels. We experiment with several deeplearning architectures and propose a data augmentation method for improving multi-label representation to balance classes within the Freesound Loop Dataset. To improve the classification accuracy of the architectures, we also evaluate different pooling operations. Results indicate that in combination with the data augmentation and pooling strategies, the proposed system achieves state-of-theart performance for AIRC. Additionally, we demonstrate how our proposed AIRC method is useful for analysing the structure of EM compositions through loop activation transcription.
Download SMSPD, LIBSMS and a Real‐Time SMS Instrument We present a real-time implementation of SMS synthesis in Pure Data. This instrument focuses on interaction with the ability to continuously synthesize any frame position within an SMS sound representation, in any order, thereby freeing time from other parameters such as frequency or spectral shape. The instrument can be controlled expressively with a Wacom Tablet that offers both coupled and absolute controls with good precision. A prototype graphical interface in python is presented that helps to interact with the SMS data through visualization. In this system, any sound sample with interesting spectral features turns into a playable instrument. The processing functionality originates in the SMS C code written almost 20 years ago, now re-factored into the open source library, libsms, also wrapped into a python module. A set of externals for Pure Data, called smspd, was made using this library to facilitate on-the-fly analysis, flexible modifications, and interactive synthesis. We discuss new transformations are introduced based on the possibilities of this system and ideas for higher-level, feature based transformations that benefit from the interactivity of this system.