Download Visualization of Signals and Algorithms in Kronos Kronos is a visual-oriented programming language and a compiler aimed at musical signal processing tasks. Its distinctive feature is the support for functional programming idioms like closures and higher order functions in the context of high performance real time DSP. This paper examines the visual aspect of the system. The programming user interface is discussed, along with a scheme for building custom data visualization algorithms inside the system.
Download RIR2FDN: An Improved Room Impulse Response Analysis and Synthesis This paper seeks to improve the state-of-the-art in delay-networkbased analysis-synthesis of measured room impulse responses (RIRs). We propose an informed method incorporating improved energy decay estimation and synthesis with an optimized feedback delay network. The performance of the presented method is compared against an end-to-end deep-learning approach. A formal listening test was conducted where participants assessed the similarity of reverberated material across seven distinct RIRs and three different sound sources. The results reveal that the performance of these methods is influenced by both the excitation sounds and the reverberation conditions. Nonetheless, the proposed method consistently demonstrates higher similarity ratings compared to the end-to-end approach across most conditions. However, achieving an indistinguishable synthesis of measured RIRs remains a persistent challenge, underscoring the complexity of this problem. Overall, this work helps improve the sound quality of analysis-based artificial reverberation.
Download Large-scale Real-time Modular Physical Modeling Sound Synthesis Due to recent increases in computational power, physical modeling synthesis is now possible in real time even for relatively complex models. We present here a modular physical modeling instrument design, intended as a construction framework for string- and bar- based instruments, alongside a mechanical network allowing for arbitrary nonlinear interconnection. When multiple nonlinearities are present in a feedback setting, there are two major concerns. One is ensuring numerical stability, which can be approached using an energy-based framework. The other is coping with the computational cost associated with nonlinear solvers—standard iterative methods, such as Newton-Raphson, quickly become a computational bottleneck. Here, such iterative methods are sidestepped using an alternative energy conserving method, allowing for great reduction in computational expense or, alternatively, to real-time performance for very large-scale nonlinear physical modeling synthesis. Simulation and benchmarking results are presented.
Download On studying auditory distance perception in concert halls with multichannel auralizations Virtual acoustics and auralizations have been previously used to study the perceptual properties of concert hall acoustics in a descriptive profiling framework. The results have indicated that the apparent auditory distance to the orchestra might play a crucial role in enhancing the listening experience and the appraisal of hall acoustics. However, it is unknown how the acoustics of the hall influence auditory distance perception in such large spaces. Here, we present one step towards studying auditory distance perception in concert halls with virtual acoustics. The aims of this investigation were to evaluate the feasibility of the auralizations and the system to study perceived distances as well as to obtain first evidence on the effects of hall acoustics and the source materials to distance perception. Auralizations were made from measured spatial impulse responses in two concert halls at 14 and 22 meter distances from the center of a calibrated loudspeaker orchestra on stage. Anechoic source materials included symphonic music and pink noise as well as signals produced by concatenating random segments of anechoic instrument recordings. Forty naive test subjects were blindfolded before entering the listening room, where they verbally reported distances to sound sources in the auralizations. Despite the large variance in distance judgments between the individuals, the reported distances were on average in the same range as the actual distances. The results show significant main effects of halls, distances and signals, but also some unexpected effects associated with the presentation order of the stimuli.
Download Sparse Atomic Modeling of Audio: a Review Research into sparse atomic models has recently intensified in the image and audio processing communities. While other reviews exist, we believe this paper provides a good starting point for the uninitiated reader as it concisely summarizes the state-of-the-art, and presents most of the major topics in an accessible manner. We discuss several approaches to the sparse approximation problem including various greedy algorithms, iteratively re-weighted least squares, iterative shrinkage, and Bayesian methods. We provide pseudo-code for several of the algorithms, and have released software which includes fast dictionaries and reference implementations for many of the algorithms. We discuss the relevance of the different approaches for audio applications, and include numerical comparisons. We also illustrate several audio applications of sparse atomic modeling.
Download Gestural exploitation of ecological information in continuous sonic feedback – The case of balancing a rolling ball Continuous sensory–motor loops form a topic dealt with rather rarely in experiments and applications of ecological auditory perception. Experiments with a tangible audio–visual interface around a physics-based sound synthesis core address this aspect. Initially dealing with the evaluation of a specific work of sound and interaction design, they deliver new arguments and notions for non-speech auditory display and are also to be seen in a wider context of psychoacoustic knowledge and methodology.
Download Real-Time Detection of Finger Picking Musical Structures MIDIME is a software architecture that houses improvisational agents that react to MIDI messages from a finger-picked guitar. They operate in a pipeline whose first stage converts MIDI messages to a map of the state of instrument strings over time, and whose second stage selects rhythmic, modal, chordal, and melodic interpretations from the superposition of interpretations latent in the first stage. These interpretations are nondeterministic, not because of any arbitrary injection of randomness by an algorithm, but because guitar playing is nondeterministic. Variations in timing, tuning, picking intensity, string damping, and accidental or intensional grace notes can affect the selections of this second stage. The selections open to the second stage, as well as the third stage that matches second stage selections to a stored library of composition fragments, reflect the superposition of possible perceptions and interpretations of a piece of music. This paper concentrates on these working analytical stages of MIDIME. It also outlines plans for using the genetic algorithm to develop improvisational agents in the final pipeline stage.
Download Automatic Tablature Transcription of Electric Guitar Recordings by Estimation of Score- and Instrument-Related Parameters In this paper we present a novel algorithm for automatic analysis, transcription, and parameter extraction from isolated polyphonic guitar recordings. In addition to general score-related information such as note onset, duration, and pitch, instrumentspecific information such as the plucked string, the applied plucking and expression styles are retrieved automatically. For this purpose, we adapted several state-of-the-art approaches for onset and offset detection, multipitch estimation, string estimation, feature extraction, and multi-class classification. Furthermore we investigated a robust partial tracking algorithm with respect to inharmonicity, an extensive extraction of novel and known audio features as well as the exploitation of instrument-based knowledge in the form of plausability filtering to obtain more reliable prediction. Our system achieved very high accuracy values of 98 % for onset and offset detection as well as multipitch estimation. For the instrument-related parameters, the proposed algorithm also showed very good performance with accuracy values of 82 % for the string number, 93 % for the plucking style, and 83 % for the expression style. Index Terms - playing techniques, plucking style, expression style, multiple fundamental frequency estimation, string classification, fretboard position, fingering, electric guitar, inharmonicity coefficient, tablature
Download Real-time Finite Difference Physical Models of Musical Instruments on a Field Programmable Gate Array (FPGA) Real-time sound synthesis of musical instruments based on solving differential equations is of great interest in Musical Acoustics especially in terms of linking geometry features of musical instruments to sound features. A major restriction of accurate physical models is the computational effort. One could state that the calculation cost is directly linked to the geometrical and material accuracy of a physical model and so to the validity of the results. This work presents a methodology for implementing realtime models of whole instrument geometries modelled with the Finite Differences Method (FDM) on a Field Programmable Gate Array (FPGA), a device capable of massively parallel computations. Examples of three real-time musical instrument implementations are given, a Banjo, a Violin and a Chinese Ruan.
Download Audio-visual Multiple Active Speaker Localization in Reverberant Environments Localisation of multiple active speakers in natural environments with only two microphones is a challenging problem. Reverberation degrades the performance of speaker localisation based exclusively on directional cues. This paper presents an approach based on audio-visual fusion. The audio modality performs the multiple speaker localisation using the Skeleton method, energy weighting, and precedence effect filtering and weighting. The video modality performs the active speaker detection based on the analysis of the lip region of the detected speakers. The audio modality alone has problems with localisation accuracy, while the video modality alone has problems with false detections. The estimation results of both modalities are represented as probabilities in the azimuth domain. A Gaussian fusion method is proposed to combine the estimates in a late stage. As a consequence, the localisation accuracy and robustness compared to the audio/video modality alone is significantly increased. Experimental results in different scenarios confirmed the improved performance of the proposed method.