Download Introducing Deep Machine Learning for Parameter Estimation in Physical Modelling One of the most challenging tasks in physically-informed sound synthesis is the estimation of model parameters to produce a desired timbre. Automatic parameter estimation procedures have been developed in the past for some specific parameters or application scenarios but, up to now, no approach has been proved applicable to a wide variety of use cases. A general solution to parameters estimation problem is provided along this paper which is based on a supervised convolutional machine learning paradigm. The described approach can be classified as “end-to-end” and requires, thus, no specific knowledge of the model itself. Furthermore, parameters are learned from data generated by the model, requiring no effort in the preparation and labeling of the training dataset. To provide a qualitative and quantitative analysis of the performance, this method is applied to a patented digital waveguide pipe organ model, yielding very promising results.
Download Nicht-negativeMatrixFaktorisierungnutzendes-KlangsynthesenSystem (NiMFKS): Extensions of NMF-based Concatenative Sound Synthesis Concatenative sound synthesis (CSS) entails synthesising a “target” sound with other sounds collected in a “corpus.” Recent work explores CSS using non-negative matrix factorisation (NMF) to approximate a target sonogram by the product of a corpus sonogram and an activation matrix. In this paper, we propose a number of extensions of NMF-based CSS and present an open MATLAB implementation in a GUI-based application we name NiMFKS. In particular we consider the following extensions: 1) we extend the NMF framework by implementing update rules based on the generalised β-divergence; 2) We add an optional monotonic algorithm for sparse-NMF; 3) we tackle the computational challenges of scaling to big corpora by implementing a corpus pruning preprocessing step; 4) we generalise constraints that may be applied to the activation matrix shape; and 5) we implement new modes of interacting with the procedure by enabling sketching and modifying of the activation matrix. Our application, NiMFKS and source code can be downloaded from here: https: //code.soundsoftware.ac.uk/projects/nimfks.
Download Automatic Control of the Dynamic Range Compressor Using a Regression Model and a Reference Sound Practical experience with audio effects as well as knowledge of their parameters and how they change the sound is crucial when controlling digital audio effects. This often presents barriers for musicians and casual users in the application of effects. These users are more accustomed to describing the desired sound verbally or using examples, rather than understanding and configuring low-level signal processing parameters. This paper addresses this issue by providing a novel control method for audio effects. While a significant body of works focus on the use of semantic descriptors and visual interfaces, little attention has been given to an important modality, the use of sound examples to control effects. We use a set of acoustic features to capture important characteristics of sound examples and evaluate different regression models that map these features to effect control parameters. Focusing on dynamic range compression, results show that our approach provides a promising first step in this direction.
Download Pinna Morphological Parameters Influencing HRTF Sets Head-Related Transfer Functions (HRTFs) are one of the main aspects of binaural rendering. By definition, these functions express the deep linkage that exists between hearing and morphology especially of the torso, head and ears. Although the perceptive effects of HRTFs is undeniable, the exact influence of the human morphology is still unclear. Its reduction into few anthropometric measurements have led to numerous studies aiming at establishing a ranking of these parameters. However, no consensus has yet been set. In this paper, we study the influence of the anthropometric measurements of the ear, as defined by the CIPIC database, on the HRTFs. This is done through the computation of HRTFs by Fast Multipole Boundary Element Method (FM-BEM) from a parametric model of torso, head and ears. Their variations are measured with 4 different spectral metrics over 4 frequency bands spanning from 0 to 16kHz. Our contribution is the establishment of a ranking of the selected parameters and a comparison to what has already been obtained by the community. Additionally, a discussion over the relevance of each approach is conducted, especially when it relies on the CIPIC data, as well as a discussion over the CIPIC database limitations.
Download Investigation of a Drum Controlled Cross-adaptive Audio Effect for Live Performance Electronic music often uses dynamic and synchronized digital audio effects that cannot easily be recreated in live performances. Cross-adaptive effects provide a simple solution to such problems since they can use multiple feature inputs to control dynamic variables in real time. We propose a generic scheme for cross-adaptive effects where onset detection on a drum track dynamically triggers effects on other tracks. This allows a percussionist to orchestrate effects across multiple instruments during performance. We describe the general structure that includes an onset detection and feature extraction algorithm, envelope and LFO synchronization, and an interface that enables the user to associate different effects to be triggered depending on the cue from the percussionist. Subjective evaluation is performed based on use in live performance. Implications on music composition and performance are also discussed. Keywords: Cross-adaptive digital audio effects, live processing, real-time control, Csound.
Download Real-time Pitch Tracking in Audio Signals with the Extended Complex Kalman Filter The Kalman filter is a well-known tool used extensively in robotics, navigation, speech enhancement and finance. In this paper, we propose a novel pitch follower based on the Extended Complex Kalman Filter (ECKF). An advantage of this pitch follower is that it operates on a sample-by-sample basis, unlike other block-based algorithms that are most commonly used in pitch estimation. Thus, it estimates sample-synchronous fundamental frequency (assumed to be the perceived pitch), which makes it ideal for real-time implementation. Simultaneously, the ECKF also tracks the amplitude envelope of the input audio signal. Finally, we test our ECKF pitch detector on a number of cello and double bass recordings played with various ornaments, such as vibrato, portamento and trill, and compare its result with the well-known YIN estimator, to conclude the effectiveness of our algorithm.
Download Harmonic-percussive Sound Separation Using Rhythmic Information from Non-negative Matrix Factorization in Single-channel Music Recordings This paper proposes a novel method for separating harmonic and percussive sounds in single-channel music recordings. Standard non-negative matrix factorization (NMF) is used to obtain the activations of the most representative patterns active in the mixture. The basic idea is to classify automatically those activations that exhibit rhythmic and non-rhythmic patterns. We assume that percussive sounds are modeled by those activations that exhibit a rhythmic pattern. However, harmonic and vocal sounds are modeled by those activations that exhibit a less rhythmic pattern. The classification of the harmonic or percussive NMF activations is performed using a recursive process based on successive correlations applied to the activations. Specifically, promising results are obtained when a sound is classified as percussive through the identification of a set of peaks in the output of the fourth correlation. The reason is because harmonic sounds tend to be represented by one valley in a half-cycle waveform at the output of the fourth correlation. Evaluation shows that the proposed method provides competitive results compared to other reference state-of-the-art methods. Some audio examples are available to illustrate the separation performance of the proposed method.
Download Beat-aligning Guitar Looper Loopers become more and more popular due to their growing features and capabilities, not only in live performances but also as a rehearsal tool. These effect units record a phrase and play it back in a loop. The start and stop positions of the recording are typically the player’s start and stop taps on a foot switch. However, if these cues are not entered precisely in time, an annoying, audible gap may occur between the repetitions of the phrase. We propose an algorithm that analyzes the recorded phrase and aligns start and stop positions in order to remove audible gaps. Efficiency, accuracy and robustness are achieved by including the phase information of the onset detection function’s STFT within the beat estimation process. Moreover, the proposed algorithm satisfies the response time required for the live application of beat alignment. We show that robustness is achieved for phrases of sparse rhythmic content for which there is still sufficient information to derive underlying beats.
Download Unsupervised Taxonomy of Sound Effects Sound effect libraries are commonly used by sound designers in a range of industries. Taxonomies exist for the classification of sounds into groups based on subjective similarity, sound source or common environmental context. However, these taxonomies are not standardised, and no taxonomy based purely on the sonic properties of audio exists. We present a method using feature selection, unsupervised learning and hierarchical clustering to develop an unsupervised taxonomy of sound effects based entirely on the sonic properties of the audio within a sound effect library. The unsupervised taxonomy is then related back to the perceived meaning of the relevant audio features.
Download The Mix Evaluation Dataset Research on perception of music production practices is mainly concerned with the emulation of sound engineering tasks through lab-based experiments and custom software, sometimes with unskilled subjects. This can improve the level of control, but the validity, transferability, and relevance of the results may suffer from this artificial context. This paper presents a dataset consisting of mixes gathered in a real-life, ecologically valid setting, and perceptual evaluation thereof, which can be used to expand knowledge on the mixing process. With 180 mixes including parameter settings, close to 5000 preference ratings and free-form descriptions, and a diverse range of contributors from five different countries, the data offers many opportunities for music production analysis, some of which are explored here. In particular, more experienced subjects were found to be more negative and more specific in their assessments of mixes, and to increasingly agree with each other.