Download Bio-Inspired Optimization of Parametric Onset Detectors
Onset detectors are used to recognize the beginning of musical events in audio signals. Manual parameter tuning for onset detectors is a time consuming task, while existing automated approaches often maximize only a single performance metric. These automated approaches cannot be used to optimize detector algorithms for complex scenarios, such as real-time onset detection where an optimization process must consider both detection accuracy and latency. For this reason, a flexible optimization algorithm should account for more than one performance metric in a multiobjective manner. This paper presents a generalized procedure for automated optimization of parametric onset detectors. Our procedure employs a bio-inspired evolutionary computation algorithm to replace manual parameter tuning, followed by the computation of the Pareto frontier for multi-objective optimization. The proposed approach was evaluated on all the onset detection methods of the Aubio library, using a dataset of monophonic acoustic guitar recordings. Results show that the proposed solution is effective in reducing the human effort required in the optimization process: it replaced more than two days of manual parameter tuning with 13 hours and 34 minutes of automated computation. Moreover, the resulting performance was comparable to that obtained by manual optimization.
Download On the Challenges of Embedded Real-Time Music Information Retrieval
Real-time applications of Music Information Retrieval (MIR) have been gaining interest as of recently. However, as deep learning becomes more and more ubiquitous for music analysis tasks, several challenges and limitations need to be overcome to deliver accurate and quick real-time MIR systems. In addition, modern embedded computers offer great potential for compact systems that use MIR algorithms, such as digital musical instruments. However, embedded computing hardware is generally resource constrained, posing additional limitations. In this paper, we identify and discuss the challenges and limitations of embedded real-time MIR. Furthermore, we discuss potential solutions to these challenges, and demonstrate their validity by presenting an embedded real-time classifier of expressive acoustic guitar techniques. The classifier achieved 99.2% accuracy in distinguishing pitched and percussive techniques and a 99.1% average accuracy in distinguishing four distinct percussive techniques with a fifth class for pitched sounds. The full classification task is a considerably more complex learning problem, with our preliminary results reaching only 56.5% accuracy. The results were produced with an average latency of 30.7 ms.
Download A Comparison of Deep Learning Inference Engines for Embedded Real-Time Audio Classification
Recent advancements in deep learning have shown great potential for audio applications, improving the accuracy of previous solutions for tasks such as music transcription, beat detection, and real-time audio processing. In addition, the availability of increasingly powerful embedded computers has led many deep learning framework developers to devise software optimized to run pretrained models in resource-constrained contexts. As a result, the use of deep learning on embedded devices and audio plugins has become more widespread. However, confusion has been rising around deep learning inference engines, regarding which of these can run in real-time and which are less resource-hungry. In this paper, we present a comparison of four available deep learning inference engines for real-time audio classification on the CPU of an embedded single-board computer: TensorFlow Lite, TorchScript, ONNX Runtime, and RTNeural. Results show that all inference engines can execute neural network models in real-time with appropriate code practices, but execution time varies between engines and models. Most importantly, we found that most of the less-specialized engines offer great flexibility and can be used effectively for real-time audio classification, with slightly better results than a real-time-specific approach. In contrast, more specialized solutions can offer a lightweight and minimalist alternative where less flexibility is needed.
Download MorphDrive: Latent Conditioning for Cross-Circuit Effect Modeling and a Parametric Audio Dataset of Analog Overdrive Pedals
In this paper, we present an approach to the neural modeling of overdrive guitar pedals with conditioning from a cross-circuit and cross-setting latent space. The resulting network models the behavior of multiple overdrive pedals across different settings, offering continuous morphing between real configurations and hybrid behaviors. Compact conditioning spaces are obtained through unsupervised training of a variational autoencoder with adversarial training, resulting in accurate reconstruction performance across different sets of pedals. We then compare three Hyper-Recurrent architectures for processing, including dynamic and static HyperRNNs, and a smaller model for real-time processing. Additionally, we present pOD-set, a new open dataset including recordings of 27 analog overdrive pedals, each with 36 gain and tone parameter combinations totaling over 97 hours of recordings. Precise parameter setting was achieved through a custom-deployed recording robot.