DAFx Paper Archive - Search for neural, page 23 of 35

Bayesian Identification of Closely-Spaced Chords from Single-Frame STFT Peaks

Randal Leistikow; Harvey Thornburg; Julius O. Smith; Jonathan Berger

DAFx-2004 - Naples

Identifying chords and related musical attributes from digital audio has proven a long-standing problem spanning many decades of research. A robust identification may facilitate automatic transcription, semantic indexing, polyphonic source separation and other emerging applications. To this end, we develop a Bayesian inference engine operating on single-frame STFT peaks. Peak likelihoods conditional on pitch component information are evaluated by an MCMC approach accounting for overlapping harmonics as well as undetected/spurious peaks, thus facilitating operation in noisy environments at very low computational cost. Our inference engine evaluates posterior probabilities of musical attributes such as root, chroma (including inversion), octave and tuning, given STFT peak frequency and amplitude observations. The resultant posteriors become highly concentrated around the correct attributes, as demonstrated using 227 ms piano recordings with −10 dB additive white Gaussian noise.

Download

Tiv.lib: An Open-Source Library for the Tonal Description of Musical Audio

António Ramires; Gilberto Bernardes; Matthew E. P. Davies; Xavier Serra

DAFx-2020 - Vienna (virtual)

In this paper, we present TIV.lib, an open-source library for the content-based tonal description of musical audio signals. Its main novelty relies on the perceptually-inspired Tonal Interval Vector space based on the Discrete Fourier transform, from which multiple instantaneous and global representations, descriptors and metrics are computed—e.g., harmonic change, dissonance, diatonicity, and musical key. The library is cross-platform, implemented in Python and the graphical programming language Pure Data, and can be used in both online and offline scenarios. Of note is its potential for enhanced Music Information Retrieval, where tonal descriptors sit at the core of numerous methods and applications.

Download

Finding Latent Sources in Recorded Music with a Shift-invariant HDP

Matthew D. Hoffmann; David M. Blei; Perry R. Cook

DAFx-2009 - Como

We present the Shift-Invariant Hierarchical Dirichlet Process (SIHDP), a nonparametric Bayesian model for modeling multiple songs in terms of a shared vocabulary of latent sound sources. The SIHDP is an extension of the Hierarchical Dirichlet Process (HDP) that explicitly models the times at which each latent component appears in each song. This extension allows us to model how sound sources evolve over time, which is critical to the human ability to recognize and interpret sounds. To make inference on large datasets possible, we develop an exact distributed Gibbs sampling algorithm to do posterior inference. We evaluate the SIHDP’s ability to model audio using a dataset of real popular music, and measure its ability to accurately find patterns in music using a set of synthesized drum loops. Ultimately, our model produces a rich representation of a set of songs consisting of a set of short sound sources and when they appear in each song.

Download

Physics-Based and Spike-Guided Tools for Sound Design

Kamil Adiloglu; Carlo Drioli; Pietro Polotti; Davide Rocchesso; Stefano delle Monache

DAFx-2010 - Graz

In this paper we present graphical tools and parameters search algorithms for the timbre space exploration and design of complex sounds generated by physical modeling synthesis. The tools are built around a sparse representation of sounds based on Gammatone functions and provide the designer with both a graphical and an auditory insight. The auditory representation of a number of reference sounds, located as landmarks in a 2D sound design space, provides the designer with an effective aid to direct his search for new sounds. The sonic landmarks can either be synthetic sounds chosen by the user or be automatically derived by using clever parameter search and clustering algorithms. The proposed probabilistic method in this paper makes use of the sparse representations to model the distance between sparsely represented sounds. A subsequent optimization model minimizes those distances to estimate the optimal parameters, which generate the landmark sounds on the given auditory landscape.

Download

Real-Time Transcription and Separation of Drum Recordings Based on NMF Decompositon

Christian Dittmar; Daniel Gärtner

DAFx-2014 - Erlangen

This paper proposes a real-time capable method for transcribing and separating occurrences of single drum instruments in polyphonic drum recordings. Both the detection and the decomposition are based on Non-Negative Matrix Factorization and can be implemented with very small systemic delay. We propose a simple modification to the update rules that allows to capture timedynamic spectral characteristics of the involved drum sounds. The method can be applied in music production and music education software. Performance results with respect to drum transcription are presented and discussed. The evaluation data-set consisting of annotated drum recordings is published for use in further studies in the field. Index Terms - drum transcription, source separation, nonnegative matrix factorization, spectral processing, audio plug-in, music production, music education

Download

Harmonic-percussive Sound Separation Using Rhythmic Information from Non-negative Matrix Factorization in Single-channel Music Recordings

Francisco Canadas-Quesada; Derry Fitzgerald; Pedro Vera-Candeas; Nicolas Ruiz-Reyes

DAFx-2017 - Edinburgh

This paper proposes a novel method for separating harmonic and percussive sounds in single-channel music recordings. Standard non-negative matrix factorization (NMF) is used to obtain the activations of the most representative patterns active in the mixture. The basic idea is to classify automatically those activations that exhibit rhythmic and non-rhythmic patterns. We assume that percussive sounds are modeled by those activations that exhibit a rhythmic pattern. However, harmonic and vocal sounds are modeled by those activations that exhibit a less rhythmic pattern. The classification of the harmonic or percussive NMF activations is performed using a recursive process based on successive correlations applied to the activations. Specifically, promising results are obtained when a sound is classified as percussive through the identification of a set of peaks in the output of the fourth correlation. The reason is because harmonic sounds tend to be represented by one valley in a half-cycle waveform at the output of the fourth correlation. Evaluation shows that the proposed method provides competitive results compared to other reference state-of-the-art methods. Some audio examples are available to illustrate the separation performance of the proposed method.

Download

Beat-aligning Guitar Looper

Daniel Rudrich; Alois Sontacchi

DAFx-2017 - Edinburgh

Loopers become more and more popular due to their growing features and capabilities, not only in live performances but also as a rehearsal tool. These effect units record a phrase and play it back in a loop. The start and stop positions of the recording are typically the player’s start and stop taps on a foot switch. However, if these cues are not entered precisely in time, an annoying, audible gap may occur between the repetitions of the phrase. We propose an algorithm that analyzes the recorded phrase and aligns start and stop positions in order to remove audible gaps. Efficiency, accuracy and robustness are achieved by including the phase information of the onset detection function’s STFT within the beat estimation process. Moreover, the proposed algorithm satisfies the response time required for the live application of beat alignment. We show that robustness is achieved for phrases of sparse rhythmic content for which there is still sufficient information to derive underlying beats.

Download

Sitrano: A Matlab App for Sines-Transients-Noise Decomposition of Audio Signals

Leonardo Fierro; Vesa Välimäki

DAFx-2021 - Vienna (virtual)

Decomposition of sounds into their sinusoidal, transient, and noise components is an active research topic and a widely-used tool in audio processing. Multiple solutions have been proposed in recent years, using time–frequency representations to identify either horizontal and vertical structures or orientations and anisotropy in the spectrogram of the sound. In this paper, we present SiTraNo: an easy-to-use MATLAB application with a graphic user interface for audio decomposition that enables visualization and access to the sinusoidal, transient, and noise classes, individually. This application allows the user to choose between different well-known separation methods to analyze an input sound file, to instantaneously control and remix its spectral components, and to visually check the quality of the separation, before producing the desired output file. The visualization of common artifacts, such as birdies and dropouts, is demonstrated. This application promotes experimenting with the sound decomposition process by observing the effect of variations for each spectral component on the original sound and by comparing different methods against each other, evaluating the separation quality both audibly and visually. SiTraNo and its source code are available on a companion website and repository.

Download

Perceptual Decorrelator Based on Resonators

Jon Fagerström; Nils Meyer-Kahlen; Sebastian J. Schlecht; Vesa Välimäki

DAFx-2025 - Ancona

Decorrelation filters transform mono audio into multiple decorrelated copies. This paper introduces a novel decorrelation filter design based on a resonator bank, which produces a sum of over a thousand exponentially decaying sinusoids. A headphone listening test was used to identify the minimum inter-channel time delays that perceptually match ERB-filtered coherent noise to corresponding incoherent noise. The decay rate of each resonator is set based on a group delay profile determined by the listening test results at its corresponding frequency. Furthermore, the delays from the test are used to refine frequency-dependent windowing in coherence estimation, which we argue represents the perceptually most accurate way of assessing interaural coherence. This coherence measure then guides an optimization process that adjusts the initial phases of the sinusoids to minimize the coherence between two instances of the resonator-based decorrelator. The delay results establish the necessary group delay per ERB for effective decorrelation, revealing higher-than-expected values, particularly at higher frequencies. For comparison, the optimization is also performed using two previously proposed group-delay profiles: one based on the period of the ERB band center frequency and another based on the maximum group-delay limit before introducing smearing. The results indicate that the perceptually informed profile achieves equal decorrelation to the latter profile while smearing less at high frequencies. Overall, optimizing the phase response of the proposed decorrelator yields significantly lower coherence compared to using a random phase.

Download

Generating Musical Accompaniment Using Finite State Transducers

Jonathan P. Forsyth; Juan P. Bello

DAFx-2013 - Maynooth

The finite state transducer (FST), a type of finite state machine that maps an input string to an output string, is a common tool in the fields of natural language processing and speech recognition. FSTs have also been applied to music-related tasks such as audio fingerprinting and the generation of musical accompaniment. In this paper, we describe a system that uses an FST to generate harmonic accompaniment to a melody. We provide details of the methods employed to quantize a music signal, the topology of the transducer, and discuss our approach to evaluating the system. We argue for an evaluation metric that takes into account the quality of the generated accompaniment, rather than one that returns a binary value indicating the correctness or incorrectness of the accompaniment.

Download

Proceedings of the International Conference on Digital Audio Effects (DAFx)

Proc. Int. Conf. Digital Audio Effects (DAFx)

Paper Archive

Years

Authors