Download Blackboard system and top-down processing for the transcription of simple polyphonic music
A system is proposed to perform the automatic music transcription of simple polyphonic tracks using top-down processing. It is composed of a blackboard system of three hierarchical levels, receiving its input from a segmentation routine in the form of an averaged STFT matrix. The blackboard contains a hypotheses database, a scheduler and knowledge sources, one of which is a neural network chord recogniser with the ability to reconfigure the operation of the system, allowing it to output more than one note hypothesis at a time. The basic implementation is explained, and some examples are provided to illustrate the performance of the system. The weaknesses of the current implementation are shown and next steps for further development of the system are defined.
Download Complex domain onset detection for musical signals
We present a novel method for onset detection in musical signals. It improves over previous energy-based and phase-based approaches by combining both types of information in the complex domain. It generates a detection function that is sharp at the position of onsets and smooth everywhere else. Results on a handlabelled data-set show that high detection rates can be achieved at very low error rates. The approach is more robust than its predecessors both theoretically and practically.
Download A comparison Between Fixed and Multiresolution Analysis for Onset Detection in Musical Signals
A study is presented for the use of multiresolution analysis-based onset detection in the complex domain. It shows that using variable time-resolution across frequency bands generates sharper detection functions for higher bands and more accurate detection functions for lower bands. The resulting method improves the localisation of onsets on fixed-resolution schemes, by favouring the increased time precision of higher subbands during the combination of results.
Download Fast implementation for non-linear time-scaling of stereo signals
In this paper we present an improved implementation of Duxbury’s adaptive phase-vocoder approach for audio time-stretching using non-linear time-scaling and temporal masked phase locking at transients [1]. We show that the previous algorithm has some limitations, notably its slow implementation and its incapacity to deal with stereo signals. We propose solutions to this problems including: an improved transient detection, a much faster implementation using the IFFT for re-synthesis and a method for stretching stereo signals without artifacts. Finally, we provide some graphical results and quantitative measures to illustrate our improvements.
Download Automated rhythmic transformation of musical audio
Time-scale transformations of audio signals have traditionally relied exclusively upon manipulations of tempo. We present a novel technique for automatic mixing and synchronization between two musical signals. In this transformation, the original signal assumes the tempo, meter, and rhythmic structure of the model signal, while the extracted downbeats and salient intra-measure infrastructure of the original are maintained.
Download Generating Musical Accompaniment Using Finite State Transducers
The finite state transducer (FST), a type of finite state machine that maps an input string to an output string, is a common tool in the fields of natural language processing and speech recognition. FSTs have also been applied to music-related tasks such as audio fingerprinting and the generation of musical accompaniment. In this paper, we describe a system that uses an FST to generate harmonic accompaniment to a melody. We provide details of the methods employed to quantize a music signal, the topology of the transducer, and discuss our approach to evaluating the system. We argue for an evaluation metric that takes into account the quality of the generated accompaniment, rather than one that returns a binary value indicating the correctness or incorrectness of the accompaniment.
Download Increasing Drum Transcription Vocabulary Using Data Synthesis
Current datasets for automatic drum transcription (ADT) are small and limited due to the tedious task of annotating onset events. While some of these datasets contain large vocabularies of percussive instrument classes (e.g. ~20 classes), many of these classes occur very infrequently in the data. This paucity of data makes it difficult to train models that support such large vocabularies. Therefore, data-driven drum transcription models often focus on a small number of percussive instrument classes (e.g. 3 classes). In this paper, we propose to support large-vocabulary drum transcription by generating a large synthetic dataset (210,000 eight second examples) of audio examples for which we have groundtruth transcriptions. Using this synthetic dataset along with existing drum transcription datasets, we train convolutional-recurrent neural networks (CRNNs) in a multi-task framework to support large-vocabulary ADT. We find that training on both the synthetic and real music drum transcription datasets together improves performance on not only large-vocabulary ADT, but also beat / downbeat detection small-vocabulary ADT.