Download Matching live sources with physical models This paper investigates the use of a physical model template database as the parameter basis for a MPEG-4 Structured Audio (MP4-SA) codec. During analysis, the codec attempts to match the closest corresponding instrument in the database. In this paper, we emphasize the mechanism enabling this match. We give an overview of the final front end, including the pitch detection stage, and remaining problems are discussed. A draft implementation, written in the Python language is described.
Download Fast Differentiable Modal Simulation of Non-Linear Strings, Membranes, and Plates Modal methods for simulating vibrations of strings, membranes, and plates are widely used in acoustics and physically
informed audio synthesis. However, traditional implementations,
particularly for non-linear models like the von Kármán plate, are
computationally demanding and lack differentiability, limiting inverse modelling and real-time applications. We introduce a fast,
differentiable, GPU-accelerated modal framework built with the
JAX library, providing efficient simulations and enabling gradientbased inverse modelling.
Benchmarks show that our approach
significantly outperforms CPU and GPU-based implementations,
particularly for simulations with many modes. Inverse modelling
experiments demonstrate that our approach can recover physical
parameters, including tension, stiffness, and geometry, from both
synthetic and experimental data. Although fitting physical parameters is more sensitive to initialisation compared to methods that
fit abstract spectral parameters, it provides greater interpretability
and more compact parameterisation. The code is released as open
source to support future research and applications in differentiable
physical modelling and sound synthesis.
Download Towards Efficient Modelling of String Dynamics: A Comparison of State Space and Koopman Based Deep Learning Methods This paper presents an examination of State Space Models (SSM) and Koopman-based deep learning methods for modelling the dynamics of both linear and non-linear stiff strings. Through experiments with datasets generated under different initial conditions and sample rates, we assess the capacity of these models to accurately model the complex behaviours observed in string dynamics. Our findings indicate that our proposed Koopman-based model performs as well as or better than other existing approaches in nonlinear cases for long-sequence modelling. We inform the design of these architectures with the structure of the problems at hand. Although challenges remain in extending model predictions beyond the training horizon (i.e., extrapolation), the focus of our investigation lies in the models’ ability to generalise across different initial conditions within the training time interval. This research contributes insights into the physical modelling of dynamical systems (in particular those addressing musical acoustics) by offering a comparative overview of these and previous methods and introducing innovative strategies for model improvement. Our results highlight the efficacy of these models in simulating non-linear dynamics and emphasise their wide-ranging applicability in accurately modelling dynamical systems over extended sequences.
Download Advanced Fourier Decomposition for Realistic Drum Synthesis This paper presents a novel method of analysing drum sounds,
demonstrating that this can form the basis of a highly realistic synthesis technique for real-time use. The synthesis method can be
viewed as an extension of IFFT synthesis; here we exploit the fact
that audio signals can be recovered from solely the real component of their discrete Fourier transform (RDFT). All characteristics
of an entire drum sample can therefore be conveniently encoded
in a single, real-valued, frequency domain signal. These signals
are interpreted, incorporating the physics of the instrument, and
modelled to investigate how the perceptual features are encoded.
The model was able to synthesize drum sound components in such
detail that they could not be distinguished in an ABX test. This
method may therefore be capable of outperforming existing synthesis techniques, in terms of realism.
Sound examples available here.
Download Automatic Polyphonic Piano Note Extraction Using Fuzzy Logic in a Blackboard System This paper presents a piano transcription system that transforms audio into MIDI format. Human knowledge and psychoacoustic models are implemented in a blackboard architecture, which allows the adding of knowledge with a top-down approach. The analysis is adapted to the information acquired. This technique is referred to as a prediction-driven approach, and it attempts to simulate the adaptation and prediction process taking place in human auditory perception. In this paper we describe the implementation of Polyphonic Note Recognition using a Fuzzy Inference System (FIS) as part of the Knowledge sources in a Blackboard system. The performance of the transcription system shows how polyphonic music transcription is still an unsolved problem, with a success of 45% according to the Dixon formula. However if we consider only the transcribed notes the success increases to 74%. Moreover, the results obtained in the paper presented in [1], show how the transcription can be used with success in a retrieval system, encouraging the authors to develop this technique for more accurate transcription results.
Download Monophonic transcription with autocorrelation This paper describes an algorithm, which performs monophonic music transcription. A pitch tracker calculates the fundamental frequency of the signal from the autocorrelation function. A continuity-restoration block takes the extracted pitch and determines the score corresponding to the original performance. The signal envelope analysis completes the transcription system, calculating attack-sustain-decay-release times, which improves the synthesis process. Attention is also paid to the extraction of timbre and wavetable synthesis.
Download Statistical Measures of Early Reflections of Room Impulse Responses An impulse response of an enclosed reverberant space is composed of three basic components: the direct sound, early reflections and late reverberation. While the direct sound is a single event that can be easily identified, the division between the early reflections and late reverberation is less obvious as there is a gradual transition between the two. This paper explores two statistical measures that can aid in determining a point in time where the early reflections have transitioned into late reverberation. These metrics exploit the similarities between late reverberation and Gaussian noise that are not commonly found in early reflections. Unlike other measures, these need no prior knowledge about the rooms such as geometry or volume.
Download 3D interactive environment for music collection navigation Previous interfaces for large collections of music have used spatial audio to enhance the presentation of a visual interface or to add a mode of interaction. An interface using only audio information is presented here as a means to explore a large music collection in a two or three-dimensional space. By taking advantage of Ambisonics and binaural technology, the application presented here can scale to large collections, have flexible playback requirements, and can be optimized for slower computers. User evaluation reveals issues in creating an intuitive mapping between between user movements in physical space and virtual movement through the collection, but the novel presentation of the music collection has positive feedback and warrants further development.
Download Novel methods in Information Management for Advanced Audio Workflows This paper discusses architectural aspects of a software library for unified metadata management in audio processing applications. The data incorporates editorial, production, acoustical and musicological features for a variety of use cases, ranging from adaptive audio effects to alternative metadata based visualisation. Our system is designed to capture information, prescribed by modular ontology schema. This advocates the development of intelligent user interfaces and advanced media workflows in music production environments. In an effort to reach these goals, we argue for the need of modularity and interoperable semantics in representing information. We discuss the advantages of extensible Semantic Web ontologies as opposed to using specialised but disharmonious metadata formats. Concepts and techniques permitting seamless integration with existing audio production software are described in detail.
Download Extraction of Metrical Structure from Music Recordings Rhythm is a fundamental aspect of music and metrical structure is an important rhythm-related element. Several mid-level features encoding metrical structure information have been proposed in the literature, although the explicit extraction of this information is rarely considered. In this paper, we present a method to extract the full metrical structure from music recordings without the need for any prior knowledge. The algorithm is evaluated against expert annotations of metrical structure for the GTZAN dataset, each track being annotated multiple times. Inter-annotator agreement and the resulting upper bound on algorithm performance are evaluated. The proposed system reaches 93% of this upper limit and largely outperforms the baseline method.