Download Introducing Deep Machine Learning for Parameter Estimation in Physical Modelling One of the most challenging tasks in physically-informed sound synthesis is the estimation of model parameters to produce a desired timbre. Automatic parameter estimation procedures have been developed in the past for some specific parameters or application scenarios but, up to now, no approach has been proved applicable to a wide variety of use cases. A general solution to parameters estimation problem is provided along this paper which is based on a supervised convolutional machine learning paradigm. The described approach can be classified as “end-to-end” and requires, thus, no specific knowledge of the model itself. Furthermore, parameters are learned from data generated by the model, requiring no effort in the preparation and labeling of the training dataset. To provide a qualitative and quantitative analysis of the performance, this method is applied to a patented digital waveguide pipe organ model, yielding very promising results.
Download Physically Derived Synthesis Model of a Cavity Tone The cavity tone is the sound generated when air flows over the open surface of a cavity and a number of physical conditions are met. Equations obtained from fluid dynamics and aerodynamics research are utilised to produce authentic cavity tones without the need to solve complex computations. Synthesis is performed with a physical model where the geometry of the cavity is used in the sound synthesis calculations. The model operates in real-time making it ideal for integration within a game or virtual reality environment. Evaluation is carried out by comparing the output of our model to previously published experimental, theoretical and computational results. Results show an accurate implementation of theoretical acoustic intensity and sound propagation equations as well as very good frequency predictions. NOMENCLATURE c = speed of sound (m/s) f = frequency (Hz) ω = angular frequency = 2πf (rads/revolution) u = air flow speed (m/s) Re = Reynolds number (dimensionless) St = Strouhal number (dimensionless) r = distance between listener and sound source (m) φ = elevation angle between listener and sound source ϕ = azimuth angle between listener and sound source ρair = mass density of air (kgm−3 ) µair = dynamic viscosity of air (Pa s) M = Mach number, M = u/c (dimensionless) L = length of cavity (m) d = depth of cavity (m) b = width of cavity (m) κ = wave number, κ = ω/c (dimensionless) r = distance between source and listener (m) δ = shear layer thickness (m) δ ∗ = effective shear layer thickness (m) δ0 = shear layer thickness at edge separation (m) θ0 = shear layer momentum thickness at edge separation (m) C2 = pressure coefficient (dimensionless)
Download Audio Processing Chain Recommendation In sound production, engineers cascade processing modules at various points in a mix to apply audio effects to channels and busses. Previous studies have investigated the automation of parameter settings based on external semantic cues. In this study, we provide an analysis of the ways in which participants apply full processing chains to musical audio. We identify trends in audio effect usage as a function of instrument type and descriptive terms, and show that processing chain usage acts as an effective way of organising timbral adjectives in low-dimensional space. Finally, we present a model for full processing chain recommendation using a Markov Chain and show that the system’s outputs are highly correlated with a dataset of user-generated processing chains.
Download A Nonlinear Method for Manipulating Warmth and Brightness In musical timbre, two of the most commonly used perceptual dimensions are warmth and brightness. In this study, we develop a model capable of accurately controlling the warmth and brightness of an audio source using a single parameter. To do this, we first identify the most salient audio features associated with the chosen descriptors by applying dimensionality reduction to a dataset of annotated timbral transformations. Here, strong positive correlations are found between the centroid of various spectral representations and the most salient principal components. From this, we build a system designed to manipulate the audio features directly using a combination of linear and nonlinear processing modules. To validate the model, we conduct a series of subjective listening tests, and show that up to 80% of participants are able to allocate the correct term, or synonyms thereof, to a set of processed audio samples. Objectively, we show low Mahalanobis distances between the processed samples and clusters of the same timbral adjective in the low-dimensional timbre space.
Download Pinna Morphological Parameters Influencing HRTF Sets Head-Related Transfer Functions (HRTFs) are one of the main aspects of binaural rendering. By definition, these functions express the deep linkage that exists between hearing and morphology especially of the torso, head and ears. Although the perceptive effects of HRTFs is undeniable, the exact influence of the human morphology is still unclear. Its reduction into few anthropometric measurements have led to numerous studies aiming at establishing a ranking of these parameters. However, no consensus has yet been set. In this paper, we study the influence of the anthropometric measurements of the ear, as defined by the CIPIC database, on the HRTFs. This is done through the computation of HRTFs by Fast Multipole Boundary Element Method (FM-BEM) from a parametric model of torso, head and ears. Their variations are measured with 4 different spectral metrics over 4 frequency bands spanning from 0 to 16kHz. Our contribution is the establishment of a ranking of the selected parameters and a comparison to what has already been obtained by the community. Additionally, a discussion over the relevance of each approach is conducted, especially when it relies on the CIPIC data, as well as a discussion over the CIPIC database limitations.
Download Automatic Control of the Dynamic Range Compressor Using a Regression Model and a Reference Sound Practical experience with audio effects as well as knowledge of their parameters and how they change the sound is crucial when controlling digital audio effects. This often presents barriers for musicians and casual users in the application of effects. These users are more accustomed to describing the desired sound verbally or using examples, rather than understanding and configuring low-level signal processing parameters. This paper addresses this issue by providing a novel control method for audio effects. While a significant body of works focus on the use of semantic descriptors and visual interfaces, little attention has been given to an important modality, the use of sound examples to control effects. We use a set of acoustic features to capture important characteristics of sound examples and evaluate different regression models that map these features to effect control parameters. Focusing on dynamic range compression, results show that our approach provides a promising first step in this direction.
Download Unsupervised Taxonomy of Sound Effects Sound effect libraries are commonly used by sound designers in a range of industries. Taxonomies exist for the classification of sounds into groups based on subjective similarity, sound source or common environmental context. However, these taxonomies are not standardised, and no taxonomy based purely on the sonic properties of audio exists. We present a method using feature selection, unsupervised learning and hierarchical clustering to develop an unsupervised taxonomy of sound effects based entirely on the sonic properties of the audio within a sound effect library. The unsupervised taxonomy is then related back to the perceived meaning of the relevant audio features.
Download The Mix Evaluation Dataset Research on perception of music production practices is mainly concerned with the emulation of sound engineering tasks through lab-based experiments and custom software, sometimes with unskilled subjects. This can improve the level of control, but the validity, transferability, and relevance of the results may suffer from this artificial context. This paper presents a dataset consisting of mixes gathered in a real-life, ecologically valid setting, and perceptual evaluation thereof, which can be used to expand knowledge on the mixing process. With 180 mixes including parameter settings, close to 5000 preference ratings and free-form descriptions, and a diverse range of contributors from five different countries, the data offers many opportunities for music production analysis, some of which are explored here. In particular, more experienced subjects were found to be more negative and more specific in their assessments of mixes, and to increasingly agree with each other.
Download A Method for Automatic Whoosh Sound Description Usually, a sound designer achieves artistic goals by editing and processing the pre-recorded sound samples. To assist navigation in the vast amount of sounds, the sound metadata is used: it provides small free-form textual descriptions of the sound file content. One can search through the keywords or phrases in the metadata to find a group of sounds that can be suitable for a task. Unfortunately, the relativity of the sound design terms complicate the search, making the search process tedious, prone to errors and by no means supportive of the creative flow. Another way to approach the sound search problem is to use sound analysis. In this paper we present a simple method for analyzing the temporal evolution of the “whoosh” sound, based on the per-band piecewise linear function approximation of the sound envelope signal. The method uses spectral centroid and fuzzy membership functions to estimate a degree to which the sound energy moves upwards or downwards in the frequency domain along the audio file. We evaluated the method on a generated dataset, consisting of white noise recordings processed with different variations of modulated bandpass filters. The method was able to correctly identify the centroid movement directions in 77% sounds from a synthetic dataset.
Download Diffuse-field Equalisation of First-order Ambisonics Timbre is a crucial element of believable and natural binaural synthesis. This paper presents a method for diffuse-field equalisation of first-order Ambisonic binaural rendering, aiming to address the timbral disparity that exists between Ambisonic rendering and head related transfer function (HRTF) convolution, as well as between different Ambisonic loudspeaker configurations. The presented work is then evaluated through listening tests, and results indicate diffuse-field equalisation is effective in improving timbral consistency.