DAFx Paper Archive - Search for machine learning, page 9 of 32

Introducing Audio D-TOUCH: A tangible user interface for music composition and performance

DAFx-2003 - London

"Audio d-touch" uses a consumer-grade web camera and customizable block objects to provide an interactive tangible interface for a variety of time based musical tasks such as sequencing, drum editing and collaborative composition. Three instruments are presented here. Future applications of the interface are also considered.

Download

Analysis of Musical Dynamics in Vocal Performances Using Loudness Measures

Jyoti Narang; Marius Miron; Ajay Srinivasamurthy; Xavier Serra

DAFx-2022 - Vienna

In addition to tone, pitch and rhythm, dynamics is one of the expressive dimensions of the performance of a music piece that has received limited attention. While the usage of dynamics may vary from artist to artist, and also from performance to performance, a systematic methodology to automatically identify the dynamics of a performance in terms of musically meaningful terms like forte, piano may offer valuable feedback in the context of music education and in particular in singing. To this end, we have manually annotated the dynamic markings of commercial recordings of popular rock and pop songs from the Smule Vocal Balanced (SVB) dataset which will be used as reference data. Then as a first step for our research goal, we propose a method to derive and compare singing voice loudness curves in polyphonic mixtures. Towards measuring the similarity and variation of dynamics, we compare the dynamics curves of the SVB renditions with the one derived from the original songs. We perform the same comparison using professionally produced renditions from a karaoke website. We relate high values of Spearman correlation coefficient found in some select student renditions and the professional renditions with accurate dynamics.

Download

Sound Matching Using Synthesizer Ensembles

Gerard Roma

DAFx-2024 - Guildford

Sound matching allows users to automatically approximate existing sounds using a synthesizer. Previous work has mostly focused on algorithms for automatically programming an existing synthesizer. This paper proposes a system for selecting between different synthesizer designs, each one with a corresponding automatic programmer. An implementation that allows designing ensembles based on a template is demonstrated. Several experiments are presented using a simple subtractive synthesis design. Using an ensemble of synthesizer-programmer pairs is shown to provide better matching than a single programmer trained for an equivalent integrated synthesizer. Scaling to hundreds of synthesizers is shown to improve match quality.

Download

Model Bending: Teaching Circuit Models New Tricks

Daniel J. Gillespie; Samuel Schachter

DAFx-2022 - Vienna

A technique is introduced for generating novel signal processing systems grounded in analog electronic circuits, called model bending. By applying the ideas behind circuit bending to models of nonlinear analog circuits it is possible to create novel nonlinear signal processors which mimic the behavior of analog electronics, but which are not possible to implement in the analog realm. The history of both circuit bending and circuit modeling is discussed, as well as a theoretical basis for how these approaches can complement each other. Potential pitfalls to the practical application of model bending are highlighted and suggested solutions to those problems are provided, with examples.

Download

Audio Visualization via Delay Embedding and Subspace Learning

Alois Cerbu; Carmine Cella

DAFx-2024 - Guildford

We describe a sequence of methods for producing videos from audio signals. Our visualizations capture perceptual features like harmonicity and brightness: they produce stable images from periodic sounds and slowly-evolving images from inharmonic ones; they associate jagged shapes to brighter sounds and rounded shapes to darker ones. We interpret our methods as adaptive FIR filterbanks and show how, for larger values of the complexity parameters, we can perform accurate frequency detection without the Fourier transform. Attached to the paper is a code repository containing the Jupyter notebook used to generate the images and videos cited. We also provide code for a realtime C++ implementation of the simplest visualization method. We discuss the mathematical theory of our methods in the two appendices.

Download

Towards an Objective Comparison of Panning Feature Algorithms for Unsupervised Learning

Richard Mitic; Andreas Rossholm

DAFx-2025 - Ancona

Estimations of panning attributes are an important feature to extract from a piece of recorded music, with downstream uses such as classification, quality assessment, and listening enhancement. While several algorithms exist in the literature, there is currently no comparison between them and no studies to suggest which one is most suitable for any particular task. This paper compares four algorithms for extracting amplitude panning features with respect to their suitability for unsupervised learning. It finds synchronicities between them and analyses their results on a small set of commercial music excerpts chosen for their distinct panning features. The ability of each algorithm to differentiate between the tracks is analysed. The results can be used in future work to either select the most appropriate panning feature algorithm or create a version customized for a particular task.

Download

Assisted Sound Sample Generation with Musical Conditioning in Adversarial Auto-Encoders

Adrien Bitton; Philippe Esling

DAFx-2019 - Birmingham

Deep generative neural networks have thrived in the field of computer vision, enabling unprecedented intelligent image processes. Yet the results in audio remain less advanced and many applications are still to be investigated. Our project targets real-time sound synthesis from a reduced set of high-level parameters, including semantic controls that can be adapted to different sound libraries and specific tags. These generative variables should allow expressive modulations of target musical qualities and continuously mix into new styles. To this extent we train auto-encoders on an orchestral database of individual note samples, along with their intrinsic attributes: note class, timbre domain (an instrument subset) and extended playing techniques. We condition the decoder for explicit control over the rendered note attributes and use latent adversarial training for learning expressive style parameters that can ultimately be mixed. We evaluate both generative performances and correlations of the attributes with the latent representation. Our ablation study demonstrates the effectiveness of the musical conditioning. The proposed model generates individual notes as magnitude spectrograms from any probabilistic latent code samples (each latent point maps to a single note), with expressive control of orchestral timbres and playing styles. Its training data subsets can directly be visualized in the 3-dimensional latent representation. Waveform rendering can be done offline with the Griffin-Lim algorithm. In order to allow real-time interactions, we fine-tune the decoder with a pretrained magnitude spectrogram inversion network and embed the full waveform generation pipeline in a plugin. Moreover the encoder could be used to process new input samples, after manipulating their latent attribute representation, the decoder can generate sample variations as an audio effect would. Our solution remains rather light-weight and fast to train, it can directly be applied to other sound domains, including an user’s libraries with custom sound tags that could be mapped to specific generative controls. As a result, it fosters creativity and intuitive audio style experimentations. Sound examples and additional visualizations are available on Github1, as well as codes after the review process.

Download

Extracting automatically the perceived intensity of music titles

Aymeric Zils; François Pachet

DAFx-2003 - London

We address the issue of extracting automatically high-level musical descriptors out of their raw audio signal. This work focuses on the extraction of the perceived intensity of music titles, that evaluates how energic the music is perceived by listeners. We present here first the perceptive tests that we have conducted, in order to evaluate the relevance and the universality of the perceived intensity descriptor. Then we present several methods used to extract relevant features used to build automatic intensity extractors: usual Mpeg7 low level features, empirical method, and features automatically found using our Extractor Discovery System (EDS), and compare the final performances of their extractors.

Download

A Generic System for Audio Indexing: Application to Speech/Music Segmentation and Music Genre Recognition

Geoffroy Peeters

DAFx-2007 - Bordeaux

In this paper we present a generic system for audio indexing (classification/ segmentation) and apply it to two usual problems: speech/ music segmentation and music genre recognition. We first present some requirements for the design of a generic system. The training part of it is based on a succession of four steps: feature extraction, feature selection, feature space transform and statistical modeling. We then propose several approaches for the indexing part depending of the local/ global characteristics of the indexes to be found. In particular we propose the use of segment-statistical models. The system is then applied to two usual problems. The first one is the speech/ music segmentation of a radio stream. The application is developed in a real industrial framework using real world categories and data. The performances obtained for the pure speech/ music classes problem are good. However when considering also the non-pure categories (mixed, bed) the performances of the system drop. The second problem is the music genre recognition. Since the indexes to be found are global, “segment-statistical models” are used leading to results close to the state of the art.

Download

Design and Understandability of Digital-Audio Musical Symbols for Intent and State Communication from Service Robots to Humans

Gunnar Johannsen

DAFx-1999 - Trondheim

Auditory displays for mobile service robots are developed. The design of digital-audio symbols, such as directional sounds and additional sounds for robot states, as well as the design of more complicated robot sound tracks are explained. Basic musical elements and robot movement sounds are combined. Two experimental studies, on the understandability of the directional sounds and on the auditory perception of intended robot trajectories in a simulated supermarket scenario, are described.

Download

Proceedings of the International Conference on Digital Audio Effects (DAFx)

Proc. Int. Conf. Digital Audio Effects (DAFx)

Paper Archive

Years

Authors