Download Automatic Control of the Dynamic Range Compressor Using a Regression Model and a Reference Sound
Practical experience with audio effects as well as knowledge of their parameters and how they change the sound is crucial when controlling digital audio effects. This often presents barriers for musicians and casual users in the application of effects. These users are more accustomed to describing the desired sound verbally or using examples, rather than understanding and configuring low-level signal processing parameters. This paper addresses this issue by providing a novel control method for audio effects. While a significant body of works focus on the use of semantic descriptors and visual interfaces, little attention has been given to an important modality, the use of sound examples to control effects. We use a set of acoustic features to capture important characteristics of sound examples and evaluate different regression models that map these features to effect control parameters. Focusing on dynamic range compression, results show that our approach provides a promising first step in this direction.
Download Dimensionality Reduction Techniques for Fear Emotion Detection from Speech
In this paper, we propose to reduce the relatively high-dimension of pitch-based features for fear emotion recognition from speech. To do so, the K-nearest neighbors algorithm has been used to classify three emotion classes: fear, neutral and ’other emotions’. Many techniques of dimensionality reduction are explored. First of all, optimal features ensuring better emotion classification are determined. Next, several families of dimensionality reduction, namely PCA, LDA and LPP, are tested in order to reveal the suitable dimension range guaranteeing the highest overall and fear recognition rates. Results show that the optimal features group permits 93.34% and 78.7% as overall and fear accuracy rates respectively. Using dimensionality reduction, Principal Component Analysis (PCA) has given the best results: 92% as overall accuracy rate and 93.3% as fear recognition percentage.
Download Universal Audio Synthesizer Control with Normalizing Flows
The ubiquity of sound synthesizers have reshaped music production and even entirely define new music genres. However, the increasing complexity and number of parameters in modern synthesizers make them harder to master. Hence, the development of methods allowing to easily create and explore with synthesizers is a crucial need. Here, we introduce a radically novel formulation of audio synthesizer control by formalizing it as finding an organized continuous latent space of audio that represents the capabilities of a synthesizer and map this space to the space of synthesis parameter. By using this formulation, we show that we can address simultaneously automatic parameter inference, macro-control learning and audio-based preset exploration within a single model. To solve this new formulation, we rely on Variational Auto-Encoders (VAE) and Normalizing Flows (NF) to organize and map the respective auditory and parameter spaces. We introduce a new type of NF named regression flows that allow to perform an invertible mapping between separate latent spaces, while steering the organization of some of the latent dimensions. We evaluate our proposal against a large set of baseline models and show its superiority in both parameter inference and audio reconstruction. We also show that the model disentangles the major factors of audio variations as latent dimensions, that can be directly used as macro-parameters. Finally, we discuss the use of our model in several creative applications and introduce real-time implementations in Ableton Live
Download On the Impact of Ground Sound
Rigid-body impact sound synthesis methods often omit the ground sound. In this paper we analyze an idealized ground-sound model based on an elastodynamic halfspace, and use it to identify scenarios wherein ground sound is perceptually relevant versus when it is masked by the impacting object’s modal sound or transient acceleration noise. Our analytical model gives a smooth, closed-form expression for ground surface acceleration, which we can then use in the Rayleigh integral or in an “acoustic shader” for a finite-difference time-domain wave simulation. We find that when modal sound is inaudible, ground sound is audible in scenarios where a dense object impacts a soft ground and scenarios where the impact point has a low elevation angle to the listening point.
Download Virtual Bass System With Fuzzy Separation of Tones and Transients
A virtual bass system creates an impression of bass perception in sound systems with weak low-frequency reproduction, which is typical of small loudspeakers. Virtual bass systems extend the bandwidth of the low-frequency audio content using either a nonlinear function or a phase vocoder, and add the processed signal to the reproduced sound. Hybrid systems separate transients and steady-state sounds, which are processed separately. It is still challenging to reach a good sound quality using a virtual bass system. This paper proposes a novel method, which separates the tonal, transient, and noisy parts of the audio signal in a fuzzy way, and then processes only the transients and tones. Those upper harmonics, which can be detected above the cutoff frequency, are boosted using timbre-matched weights, but missing upper harmonics are generated to assist the missing fundamental phenomenon. Listening test results show that the proposed algorithm outperforms selected previous methods in terms of perceived bass sound quality. The proposed method can enhance the bass sound perception of small loudspeakers, such as those used in laptop computers and mobile devices.
Download Optimization of Cascaded Parametric Peak and Shelving Filters With Backpropagation Algorithm
Peak and shelving filters are parametric infinite impulse response filters which are used for amplifying or attenuating a certain frequency band. Shelving filters are parametrized by their cut-off frequency and gain, and peak filters by center frequency, bandwidth and gain. Such filters can be cascaded in order to perform audio processing tasks like equalization, spectral shaping and modelling of complex transfer functions. Such a filter cascade allows independent optimization of the mentioned parameters of each filter. For this purpose, a novel approach is proposed for deriving the necessary local gradients with respect to the control parameters and for applying the instantaneous backpropagation algorithm to deduce the gradient flow through a cascaded structure. Additionally, the performance of such a filter cascade adapted with the proposed method, is exhibited for head-related transfer function modelling, as an example application.
Download Quality Diversity for Synthesizer Sound Matching
It is difficult to adjust the parameters of a complex synthesizer to create the desired sound. As such, sound matching, the estimation of synthesis parameters that can replicate a certain sound, is a task that has often been researched, utilizing optimization methods such as genetic algorithm (GA). In this paper, we introduce a novelty-based objective for GA-based sound matching. Our contribution is two-fold. First, we show that the novelty objective is able to improve the quality of sound matching by maintaining phenotypic diversity in the population. Second, we introduce a quality diversity approach to the problem of sound matching, aiming to find a diverse set of matching sounds. We show that the novelty objective is effective in producing high-performing solutions that are diverse in terms of specified audio features. This approach allows for a new way of discovering sounds and exploring the capabilities of a synthesizer.
Download Lookup Table Based Audio Spectral Transformation
We present a unified visual interface for flexible spectral audio manipulation based on editable lookup tables (LUTs). In the proposed approach, the audio spectrum is visualized as a two-dimensional color map of frequency versus amplitude, serving as an editable lookup table for modifying the sound. This single tool can replicate common audio effects such as equalization, pitch shifting, and spectral compression, while also enabling novel sound transformations through creative combinations of adjustments. By consolidating these capabilities into one visual platform, the system has the potential to streamline audio-editing workflows and encourage creative experimentation. The approach also supports real-time processing, providing immediate auditory feedback in an interactive graphical environment. Overall, this LUT-based method offers an accessible yet powerful framework for designing and applying a broad range of spectral audio effects through intuitive visual manipulation.
Download Musical Gestures and Audio Effects Processing
We introduce the notion of musical gestures as time varying measurements which identify the audio input stream’s musical skeleton without attempting to implement any involved model of musical understanding. Living comfortably at an intermediate level of abstraction between wave forms and music transcriptions, these musical gestures are used to control the behavior of an audio processing module. The resulting scheme qualifies as an audio effects processing system as it essentially transforms an audio stream into another.
Download A Method of Generic Programming for High Performance DSP
This paper presents some key concepts for a new just in time programming language designed for high performance DSP. The language is primarily intended to implement an updated version of PWGLSynth, the synthesis extension to the visual musical programming environment PWGL. However, the system is suitable for use as a backend for any DSP platform. A flow control mechanism based on generic programming, polymorphism and functional programming practices is presented, which we believe is much better suited for visual programming than traditional loop constructs found in textual languages.