DAFx Paper Archive - Browse all papers by Serra, X.

Metrix: A Musical Data Definition Language and Data Structure for a Spectral Modeling Based Synthesizer

Xavier Amatriain; Jordi Bonada; Xavier Serra

DAFx-1998 - Barcelona

Since the MIDI 1.0 specification [1], well over 15 years ago, many have been the attempts to give a solution to all the limitations that soon became clear. None of these have had a happy ending, mainly due to commercial interests and as a result, when trying to find an appropriate synthesis control user interface, we had not many choices but the use of MIDI. That’s the reason why the idea of defining a new user interface aroused. In this article, the main components of this interface will be discussed, paying special attention to the advantages and new features it reports to the enduser.

Download

Sound Transformations Based on the SMS High Level Attributes

Xavier Serra; Jordi Bonada

DAFx-1998 - Barcelona

The basic Spectral Modeling Synthesis (SMS) technique models sounds as the sum of sinusoids plus a residual. Though this analysis/synthesis system has proved to be successful in transforming sounds, more powerful and intuitive musical transformations can be achieved by moving into the SMS high-level attribute plane. In this paper we describe how to extract high level sound attributes from the basic representation, modify them, and add them back before the synthesis stage. In this process new problems come up for which we propose some initial solutions.

Download

The Origins of DAFx and its Future within the Sound and Music Computing Field

Xavier Serra

DAFx-2007 - Bordeaux

DAFX is an established conference that has become a reference gathering for the researchers working on audio signal processing. In this presentation I will go back ten years to the beginning of this conference and to the ideas that promoted it. Then I will jump to the present, to the current context of our research field, different from the one ten years ago, and I will make some personal reflections on the current situation and the challenges that we are encountering.

Download

SMSPD, LIBSMS and a Real‐Time SMS Instrument

Richard Eakin; Xavier Serra

DAFx-2009 - Como

We present a real-time implementation of SMS synthesis in Pure Data. This instrument focuses on interaction with the ability to continuously synthesize any frame position within an SMS sound representation, in any order, thereby freeing time from other parameters such as frequency or spectral shape. The instrument can be controlled expressively with a Wacom Tablet that offers both coupled and absolute controls with good precision. A prototype graphical interface in python is presented that helps to interact with the SMS data through visualization. In this system, any sound sample with interesting spectral features turns into a playable instrument. The processing functionality originates in the SMS C code written almost 20 years ago, now re-factored into the open source library, libsms, also wrapped into a python module. A set of externals for Pure Data, called smspd, was made using this library to facilitate on-the-fly analysis, flexible modifications, and interactive synthesis. We discuss new transformations are introduced based on the possibilities of this system and ideas for higher-level, feature based transformations that benefit from the interactivity of this system.

Download

Data Augmentation for Instrument Classification Robust to Audio Effects

António Ramires; Xavier Serra

DAFx-2019 - Birmingham

Reusing recorded sounds (sampling) is a key component in Electronic Music Production (EMP), which has been present since its early days and is at the core of genres like hip-hop or jungle. Commercial and non-commercial services allow users to obtain collections of sounds (sample packs) to reuse in their compositions. Automatic classification of one-shot instrumental sounds allows automatically categorising the sounds contained in these collections, allowing easier navigation and better characterisation. Automatic instrument classification has mostly targeted the classification of unprocessed isolated instrumental sounds or detecting predominant instruments in mixed music tracks. For this classification to be useful in audio databases for EMP, it has to be robust to the audio effects applied to unprocessed sounds. In this paper we evaluate how a state of the art model trained with a large dataset of one-shot instrumental sounds performs when classifying instruments processed with audio effects. In order to evaluate the robustness of the model, we use data augmentation with audio effects and evaluate how each effect influences the classification accuracy.

Download

Tiv.lib: An Open-Source Library for the Tonal Description of Musical Audio

António Ramires; Gilberto Bernardes; Matthew E. P. Davies; Xavier Serra

DAFx-2020 - Vienna (virtual)

In this paper, we present TIV.lib, an open-source library for the content-based tonal description of musical audio signals. Its main novelty relies on the perceptually-inspired Tonal Interval Vector space based on the Discrete Fourier transform, from which multiple instantaneous and global representations, descriptors and metrics are computed—e.g., harmonic change, dissonance, diatonicity, and musical key. The library is cross-platform, implemented in Python and the graphical programming language Pure Data, and can be used in both online and offline scenarios. Of note is its potential for enhanced Music Information Retrieval, where tonal descriptors sit at the core of numerous methods and applications.

Download

Analysis of Musical Dynamics in Vocal Performances Using Loudness Measures

Jyoti Narang; Marius Miron; Ajay Srinivasamurthy; Xavier Serra

DAFx-2022 - Vienna

In addition to tone, pitch and rhythm, dynamics is one of the expressive dimensions of the performance of a music piece that has received limited attention. While the usage of dynamics may vary from artist to artist, and also from performance to performance, a systematic methodology to automatically identify the dynamics of a performance in terms of musically meaningful terms like forte, piano may offer valuable feedback in the context of music education and in particular in singing. To this end, we have manually annotated the dynamic markings of commercial recordings of popular rock and pop songs from the Smule Vocal Balanced (SVB) dataset which will be used as reference data. Then as a first step for our research goal, we propose a method to derive and compare singing voice loudness curves in polyphonic mixtures. Towards measuring the similarity and variation of dynamics, we compare the dynamics curves of the SVB renditions with the one derived from the original songs. We perform the same comparison using professionally produced renditions from a karaoke website. We relate high values of Spearman correlation coefficient found in some select student renditions and the professional renditions with accurate dynamics.

Download

A Study of Control Methods for Percussive Sound Synthesis Based on Gans

António Ramires; Jordan Juras; Julian D. Parker; Xavier Serra

DAFx-2022 - Vienna

The process of creating drum sounds has seen significant evolution in the past decades. The development of analogue drum synthesizers, such as the TR-808, and modern sound design tools in Digital Audio Workstations led to a variety of drum timbres that defined entire musical genres. Recently, drum synthesis research has been revived with a new focus on training generative neural networks to create drum sounds. Different interfaces have previously been proposed to control the generative process, from low-level latent space navigation to high-level semantic feature parameterisation, but no comprehensive analysis has been presented to evaluate how each approach relates to the creative process. We aim to evaluate how different interfaces support creative control over drum generation by conducting a user study based on the Creative Support Index. We experiment with both a supervised method that decodes semantic latent space directions and an unsupervised Closed-Form Factorization approach from computer vision literature to parameterise the generation process and demonstrate that the latter is the preferred means to control a drum synthesizer based on the StyleGAN2 network architecture.

Download

Improved Automatic Instrumentation Role Classification and Loop Activation Transcription

Jake Drysdale; António Ramires; Xavier Serra; Jason Hockman

DAFx-2022 - Vienna

Many electronic music (EM) genres are composed through the activation of short audio recordings of instruments designed for seamless repetition—or loops. In this work, loops of key structural groups such as bass, percussive or melodic elements are labelled by the role they occupy in a piece of music through the task of automatic instrumentation role classification (AIRC). Such labels assist EM producers in the identification of compatible loops in large unstructured audio databases. While human annotation is often laborious, automatic classification allows for fast and scalable generation of these labels. We experiment with several deeplearning architectures and propose a data augmentation method for improving multi-label representation to balance classes within the Freesound Loop Dataset. To improve the classification accuracy of the architectures, we also evaluate different pooling operations. Results indicate that in combination with the data augmentation and pooling strategies, the proposed system achieves state-of-theart performance for AIRC. Additionally, we demonstrate how our proposed AIRC method is useful for analysing the structure of EM compositions through loop activation transcription.

Download

Years

Authors