Download GPGPU Audio Benchmark Framework Acceleration of audio workloads on generally-programmable GPU (GPGPU) hardware offers potentially high speedup factors, but also presents challenges in terms of development and deployment. We can increasingly depend on such hardware being available in users’ systems, yet few real-time audio products use this resource. We propose a suite of benchmarks to qualify a GPU as suitable for batch or real-time audio processing. This includes both microbenchmarks and higher-level audio domain benchmarks. We choose metrics based on application, paying particularly close attention to latency tail distribution. We propose an extension to the benchmark framework to more accurately simulate the real-world request pattern and performance requirements when running in a digital audio workstation. We run these benchmarks on two common consumer-level platforms: a PC desktop with a recent midrange discrete GPU and a Macintosh desktop with unified CPUGPU memory architecture.
Download Comparing Acoustic and Digital Piano Actions: Data Analysis and Key Insights The acoustic piano and its sound production mechanisms have been
extensively studied in the field of acoustics. Similarly, digital piano synthesis has been the focus of numerous signal processing
research studies. However, the role of the piano action in shaping the dynamics and nuances of piano sound has received less
attention, particularly in the context of digital pianos. Digital pianos are well-established commercial instruments that typically use
weighted keys with two or three sensors to measure the average
key velocity—this being the only input to a sampling synthesis
engine. In this study, we investigate whether this simplified measurement method adequately captures the full dynamic behavior of
the original piano action. After a brief review of the state of the art,
we describe an experimental setup designed to measure physical
properties of the keys and hammers of a piano. This setup enables
high-precision readings of acceleration, velocity, and position for
both the key and hammer across various dynamic levels. Through
extensive data analysis, we examine their relationships and identify
the optimal key position for velocity measurement. We also analyze
a digital piano key to determine where the average key velocity is
measured and compare it with our proposed optimal timing. We
find that the instantaneous key velocity just before let-off correlates
most strongly with hammer impact velocity, indicating a target
for improved sensing; however, due to the limitations of discrete
velocity sensing this optimization alone may not suffice to replicate
the nuanced expressiveness of acoustic piano touch. This study
represents the first step in a broader research effort aimed at linking
piano touch, dynamics, and sound production.
Download Digital Morphophone Environment. Computer Rendering of a Pioneering Sound Processing Device This paper introduces a digital reconstruction of the morphophone,
a complex magnetophonic device developed in the 1950s within
the laboratories of the GRM (Groupe de Recherches Musicales)
in Paris. The analysis, design, and implementation methodologies
underlying the Digital Morphophone Environment are discussed.
Based on a detailed review of historical sources and limited
documentation – including a small body of literature and, most
notably, archival images – the core operational principles of the
morphophone have been modeled within the MAX visual programming environment. The main goals of this work are, on the one
hand, to study and make accessible a now obsolete and unavailable
tool, and on the other, to provide the opportunity for new explorations in computer music and research.
Download Towards an Objective Comparison of Panning Feature Algorithms for Unsupervised Learning Estimations of panning attributes are an important feature to extract from a piece of recorded music, with downstream uses such
as classification, quality assessment, and listening enhancement.
While several algorithms exist in the literature, there is currently
no comparison between them and no studies to suggest which one
is most suitable for any particular task. This paper compares four
algorithms for extracting amplitude panning features with respect
to their suitability for unsupervised learning. It finds synchronicities between them and analyses their results on a small set of
commercial music excerpts chosen for their distinct panning features. The ability of each algorithm to differentiate between the
tracks is analysed. The results can be used in future work to either
select the most appropriate panning feature algorithm or create a
version customized for a particular task.
Download Shifted NMF with Group Sparsity for Clustering NMF Basis Functions Recently, Non-negative Matrix Factorisation (NMF) has found application in separation of individual sound sources. NMF decomposes the spectrogram of an audio mixture into an additive parts based representation where the parts typically correspond to individual notes or chords. However, there is a need to cluster the NMF basis functions to their sources. Although, many attempts have been made to improve the clustering of the basis functions to sources, much research is still required in this area. Recently, Shifted Non-negative Matrix Factorisation (SNMF) was used to cluster these basis functions. To this end, we propose that the incorporation of group sparsity to the Shifted NMF based methods may benefit the clustering algorithms. We have tested this on SNMF algorithms with improved separation quality. Results show that this gives improved clustering of pitched basis functions over previous methods.
Download Real-time Finite Difference Physical Models of Musical Instruments on a Field Programmable Gate Array (FPGA) Real-time sound synthesis of musical instruments based on solving differential equations is of great interest in Musical Acoustics especially in terms of linking geometry features of musical instruments to sound features. A major restriction of accurate physical models is the computational effort. One could state that the calculation cost is directly linked to the geometrical and material accuracy of a physical model and so to the validity of the results. This work presents a methodology for implementing realtime models of whole instrument geometries modelled with the Finite Differences Method (FDM) on a Field Programmable Gate Array (FPGA), a device capable of massively parallel computations. Examples of three real-time musical instrument implementations are given, a Banjo, a Violin and a Chinese Ruan.
Download Characterisation of Acoustic Scenes Using a Temporally-constrained Shift-invariant Model In this paper, we propose a method for modeling and classifying acoustic scenes using temporally-constrained shift-invariant probabilistic latent component analysis (SIPLCA). SIPLCA can be used for extracting time-frequency patches from spectrograms in an unsupervised manner. Component-wise hidden Markov models are incorporated to the SIPLCA formulation for enforcing temporal constraints on the activation of each acoustic component. The time-frequency patches are converted to cepstral coefficients in order to provide a compact representation of acoustic events within a scene. Experiments are made using a corpus of train station recordings, classified into 6 scene classes. Results show that the proposed model is able to model salient events within a scene and outperforms the non-negative matrix factorization algorithm for the same task. In addition, it is demonstrated that the use of temporal constraints can lead to improved performance.
Download The Tonalness Spectrum: Feature-Based Estimation of Tonal Components The tonalness spectrum shows the likelihood of a spectral bin being part of a tonal or non-tonal component. It is a non-binary measure based on a set of established spectral features. An easily extensible framework for the computation, selection, and combination of features is introduced. The results are evaluated and compared in two ways. First with a data set of synthetically generated signals but also with real music signals in the context of a typical MIR application.
Download Unsupervised Audio Key and Chord Recognition This paper presents a new methodology for determining chords of a music piece without using training data. Specifically, we introduce: 1) a wavelet-based audio denoising component to enhance a chroma-based feature extraction framework, 2) an unsupervised key recognition component to extract a bag of local keys, 3) a chord recognizer using estimated local keys to adjust the chromagram based on a set of well-known tonal profiles to recognize chords on a frame-by-frame basis. We aim to recognize 5 classes of chords (major, minor, diminished, augmented, suspended) and 1 N (no chord or silence). We demonstrate the performance of the proposed approach using 175 Beatles’ songs which we achieved 75% in F-measure for estimating a bag of local keys and at least 68.2% accuracy on chords without discarding any audio segments or the use of other musical elements. The experimental results also show that the wavelet-based denoiser improves the chord recognition rate by approximately 4% over that of other chroma features.
Download On the window-disjoint-orthogonality of speech sources in reverberant humanoid scenarios Many speech source separation approaches are based on the assumption of orthogonality of speech sources in the time-frequency domain. The target speech source is demixed from the mixture by applying the ideal binary mask to the mixture. The time-frequency orthogonality of speech sources is investigated in detail only for anechoic and artificially mixed speech mixtures. This paper evaluates how the orthogonality of speech sources decreases when using a realistic reverberant humanoid recording setup and indicates strategies to enhance the separation capabilities of algorithms based on ideal binary masks under these conditions. It is shown that the SIR of the target source demixed from the mixture using the ideal binary mask decreases by approximately 3 dB for reverberation times of T60 = 0.6 s opposed to the anechoic scenario. For humanoid setups, the spatial distribution of the sources and the choice of the correct ear channel introduces differences in the SIR of further 3 dB, which leads to specific strategies to choose the best channel for demixing.