Download Graph-Based Audio Looping and Granulation
In this paper we describe similarity graphs computed from timefrequency analysis as a guide for audio playback, with the aim of extending the content of fixed recordings in creative applications. We explain the creation of the graph from the distance between spectral frames, as well as several features computed from the graph, such as methods for onset detection, beat detection, and cluster analysis. Several playback algorithms can be devised based on conditional pruning of the graph using these methods. We describe examples for looping, granulation, and automatic montage.
Download Pywdf: An Open Source Library for Prototyping and Simulating Wave Digital Filter Circuits in Python
This paper introduces a new open-source Python library for the modeling and simulation of wave digital filter (WDF) circuits. The library, called pwydf, allows users to easily create and analyze WDF circuit models in a high-level, object-oriented manner. The library includes a variety of built-in components, such as voltage sources, capacitors, diodes etc., as well as the ability to create custom components and circuits. Additionally, pywdf includes a variety of analysis tools, such as frequency response and transient analysis, to aid in the design and optimization of WDF circuits. We demonstrate the library’s efficacy in replicating the nonlinear behavior of an analog diode clipper circuit, and in creating an allpass filter that cannot be realized in the analog world. The library is well-documented and includes several examples to help users get started. Overall, pywdf is a powerful tool for anyone working with WDF circuits, and we hope it can be of great use to researchers and engineers in the field.
Download Interactive digital audio environments: gesture as a musical parameter
This paper presents some possible relationships between gesture and sound that may be built with an interactive digital audio environment. In a traditional musical situation gesture usually produces sound. The relationship between gesture and sound is unique, it is a cause to effect link. In computer music, the possibility of uncoupling gesture from sound is due to the fact that computer can carry out all the aspects of sound production from composition up to interpretation and performance. Real time computing technology and development of human gesture tracking systems may enable gesture to be introduced again into the practice of computer music, but with a completely renewed approach. There is no more need to create direct cause to effect relationships for sound production, and gesture may be seen as another musical parameter to play with in the context of interactive musical performances.
Download A system for data-driven concatenative sound synthesis
In speech synthesis, concatenative data-driven synthesis methods prevail. They use a database of recorded speech and a unit selection algorithm that selects the segments that match best the utterance to be synthesized. Transferring these ideas to musical sound synthesis allows a new method of high quality sound synthesis. Usual synthesis methods are based on a model of the sound signal. It is very difficult to build a model that would preserve the entire fine details of sound. Concatenative synthesis achieves this by using actual recordings. This data-driven approach (as opposed to a rule-based approach) takes advantage of the information contained in the many sound recordings. For example, very naturally sounding transitions can be synthesized, since unit selection is aware of the context of the database units. The C ATERPILLAR software system has been developed to allow data-driven concatenative unit selection sound synthesis. It allows high-quality instrument synthesis with high level control, explorative free synthesis from arbitrary sound databases, or resynthesis of a recording with sounds from the database. It is based on the new software-engineering concept of component-oriented software, increasing flexibility and facilitating reuse.
Download On the evaluation of perceptual similarity measures for music
Several applications in the field of content-based interaction with music repositories rely on measures which estimate the perceived similarity of music. These applications include automatic genre recognition, playlist generation, and recommender systems. In this paper we study methods to evaluate the performance of such measures. We compare five measures which use only the information extracted from the audio signal and discuss how these measures can be evaluated qualitatively and quantitatively without resorting to large scale listening tests.
Download Binaural source localization
In binaural signals, interaural time differences (ITDs) and interaural level differences (ILDs) are two of the most important cues for the estimation of source azimuths, i.e. the localization of sources in the horizontal plane. For narrow band signals, according to the duplex theory, ITD is dominant at low frequencies and ILD is dominant at higher frequencies. Based on the STFT spectra of binaural signals, a method is proposed for the combined evaluation of ITD and ILD for each individual spectral coefficient. ITD and ILD are related to the azimuth through lookup models. Azimuth estimates based on ITD are more accurate but ambiguous at higher frequencies due to phase wrapping. The less accurate but unambiguous azimuth estimates based on ILDs are used in order to select the closest candidate azimuth estimates based on ITDs, effectively improving the azimuth estimation. The method corresponds well with the duplex theory and also handles the transition from low to high frequencies gracefully. The relations between the ITD and ILD and the azimuth are computed from a measured set of head related transfer functions (HRTFs), yielding azimuth lookup models. Based on a study of these models for different subjects, parametric azimuth lookup models are proposed. The parameters of these models can be optimized for an individual subject whose HRTFs have been measured. In addition, subject independent lookup models are proposed, parametrized only by the distance between the ears, effectively enabling source localization for subjects whose HRTFs have not been measured.
Download A FPGA‐based Adaptive Noise Cancelling System
A FPGA-based system suitable for augmented reality audio applications is presented. The sample application described here is adaptive noise cancellation (ANC). The system consists of a Spartan -3 FPGA XC3S400 board connected to a Philips Stereo-AudioCodec UCB 1400. The algorithms for the FIR filtering and for the adaption of the filter coefficients according to the Widrow-Hoff LMS algorithm are implemented on the FPGA board. Measurement results obtained with a dummy head measuring system are reported, and a detailed analysis of system performance and possible system improvements is given.
Download A Segmental Spectro-Temporal Model of Musical Timbre
We propose a new statistical model of musical timbre that handles the different segments of the temporal envelope (attack, sustain and release) separately in order to account for their different spectral and temporal behaviors. The model is based on a reduced-dimensionality representation of the spectro-temporal envelope. Temporal coefficients corresponding to the attack and release segments are subjected to explicit trajectory modeling based on a non-stationary Gaussian Process. Coefficients corresponding to the sustain phase are modeled as a multivariate Gaussian. A compound similarity measure associated with the segmental model is proposed and successfully tested in instrument classification experiments. Apart from its use in a statistical framework, the modeling method allows intuitive and informative visualizations of the characteristics of musical timbre.
Download Frequency estimation of the first pinna notch in Head-Related Transfer Functions with a linear anthropometric model
The relation between anthropometric parameters and Head-Related Transfer Function (HRTF) features, especially those due to the pinna, are not fully understood yet. In this paper we apply signal processing techniques to extract the frequencies of the main pinna notches (known as N1 , N2 , and N3 ) in the frontal part of the median plane and build a model relating them to 13 different anthropometric parameters of the pinna, some of which depend on the elevation angle of the sound source. Results show that while the considered anthropometric parameters are not able to approximate with sufficient accuracy neither the N2 nor the N3 frequency, eight of them are sufficient for modeling the frequency of N1 within a psychoacoustically acceptable margin of error. In particular, distances between the ear canal and the outer helix border are the most important parameters for predicting N1 .
Download Evaluating the Performance of Objective Audio Quality Metrics in Response to Common Audio Degradations
This study evaluates the performance of five objective audio quality metrics—PEAQ Basic, PEAQ Advanced, PEMO-Q, ViSQOL, and HAAQI —in the context of digital music production. Unlike previous comparisons, we focus on their suitability for production environments, an area currently underexplored in existing research. Twelve audio examples were tested using two evaluation types: an effectiveness test under progressively increasing degradations (hum, hiss, clipping, glitches) and a robustness test under fixed-level, randomly fluctuating degradations. In the effectiveness test, HAAQI, PEMO-Q, and PEAQ Basic effectively tracked degradation changes, while PEAQ Advanced failed consistently and ViSQOL showed low sensitivity to hum and glitches. In the robustness test, ViSQOL and HAAQI demonstrated the highest consistency, with average standard deviations of 0.004 and 0.007, respectively, followed by PEMO-Q (0.021), PEAQ Basic (0.057), and PEAQ Advanced (0.065). However, ViSQOL also showed low variability across audio examples, suggesting limited genre sensitivity. These findings highlight the strengths and limitations of each metric for music production, specifically quality measurement with compressed audio. The source code and dataset will be made publicly available upon publication.