Download DDSP-SFX: Acoustically-Guided Sound Effects Generation with Differentiable Digital Signal Processing Controlling the variations of sound effects using neural audio synthesis models has been a challenging task. Differentiable digital signal processing (DDSP) provides a lightweight solution that achieves high-quality sound synthesis while enabling deterministic acoustic attribute control by incorporating pre-processed audio features and digital synthesizers. In this research, we introduce DDSP-SFX, a model based on the DDSP architecture capable of synthesizing high-quality sound effects while enabling users to control the timbre variations easily. We integrate a transient modelling algorithm in DDSP that achieves higher objective evaluation scores and subjective ratings over impulsive signals (footsteps, gunshots). We propose a novel method that achieves frame-level timbre variation control while also allowing deterministic attribute control. We further qualitatively show the timbre transfer performance using voice as the guiding sound.
Download Differentiable All-Pole Filters for Time-Varying Audio Systems Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers. However, their recursive structure impedes end-toend training of these systems using automatic differentiation. Although non-recursive filter approximations like frequency sampling and frame-based processing have been proposed and widely used in previous works, they cannot accurately reflect the gradient of the original system. We alleviate this difficulty by reexpressing a time-varying all-pole filter to backpropagate the gradients through itself, so the filter implementation is not bound to the technical limitations of automatic differentiation frameworks. This implementation can be employed within audio systems containing filters with poles for efficient gradient evaluation. We demonstrate its training efficiency and expressive capabilities for modelling real-world dynamic audio systems on a phaser, time-varying subtractive synthesiser, and feed-forward compressor. We make our code and audio samples available and provide the trained audio effect and synth models in a VST plugin1 .
Download Towards Neural Emulation of Voltage-Controlled Oscillators Machine learning models have become ubiquitous in modeling
analog audio devices. Expanding on this line of research, our study
focuses on Voltage-Controlled Oscillators of analog synthesizers.
We employ black box autoregressive artificial neural networks to
model the typical analog waveshapes, including triangle, square,
and sawtooth. The models can be conditioned on wave frequency
and type, enabling the generation of pitch envelopes and morphing across waveshapes. We conduct evaluations on both synthetic
and analog datasets to assess the accuracy of various architectural
variants. The LSTM variant performed better, although lower frequency ranges present particular challenges.
Download Chroma and MFCC Based Pattern Recognition in Audio Files Utilizing Hidden Markov Models And Dynamic Programming In this paper we present an algorithm to reveal the immanent musical structure within pieces of popular music. Our proposed model uses an estimate of the harmonic progression which is obtained by calculating beat-synchronous chroma vectors and letting a Hidden Markov Model (HMM) decide the most probable sequence of chords. In addition, MFCC vectors are computed to retrieve basic timbral information that can not be described by harmony. Subsequently, a dynamic programming algorithm is used to detect repetitive patterns in these feature sequences. Based on these patterns a second dynamic programming stage tries to find and link corresponding patterns to larger segments that reflect the musical structure.
Download Estimation and Modeling of Pinna-Related Transfer Functions This paper considers the problem of modeling pinna-related transfer functions (PRTFs) for 3-D sound rendering. Following a structural modus operandi, we present an algorithm for the decomposition of PRTFs into ear resonances and frequency notches due to reflections over pinna cavities. Such an approach allows to control the evolution of each physical phenomenon separately through the design of two distinct filter blocks during PRTF synthesis. The resulting model is suitable for future integration into a structural head-related transfer function model, and for parametrization over anthropometrical measurements of a wide range of subjects.
Download A Hilbert-Transformer Frequency Shifter for Audio In contrast to conventional pitch-shifting effects which attempt to maintain harmonic relationships in the signal, a frequency shifter translates all the component frequencies of the input signal by an equal amount, disrupting the harmonic relationships and radically altering the sonic qualities of the signal. Ring modulation is a generalization of double-sideband suppressed-carrier modulation, and the frequency shifter is equivalent to a single-sideband modulator. Applications of the frequency shifter include the creation of bizarre distortions, phaser, and rotating speaker effects. An implementation is presented that is suitable for fixed-point digital hardware.
Download An Interdisciplinary Approach to Audio Effect Classification The aim of this paper is to propose an interdisciplinary classification of digital audio effects to facilitate communication and collaborations between DSP programmers, sound engineers, composers, performers and musicologists. After reviewing classifications reflecting technological, technical and perceptual points of view, we introduce a transverse classification to link disciplinespecific classifications into a single network containing various layers of descriptors, ranging from low-level features to high-level features. Simple tools using the interdisciplinary classification are introduced to facilitate the navigation between effects, underlying techniques, perceptual attributes and semantic descriptors. Finally, concluding remarks on implications for teaching purposes and for the development of audio effects user interfaces based on perceptual features rather than technical parameters are presented.
Download Physically Based Sound Synthesis and Control of Footsteps Sounds We describe a system to synthesize in real-time footsteps sounds. The sound engine is based on physical models and physically inspired models reproducing the act of walking on several surfaces. To control the real-time engine, three solutions are proposed. The first two solutions are based on floor microphones, while the third one is based on shoes enhanced with sensors. The different solutions proposed are discussed in the paper.
Download Adaptive Pitch-Shifting With Applications to Intonation Adjustment in a Cappella Recordings A central challenge for a cappella singers is to adjust their intonation and to stay in tune relative to their fellow singers. During
editing of a cappella recordings, one may want to adjust local intonation of individual singers or account for global intonation drifts
over time. This requires applying a time-varying pitch-shift to the
audio recording, which we refer to as adaptive pitch-shifting. In
this context, existing (semi-)automatic approaches are either laborintensive or face technical and musical limitations. In this work,
we present automatic methods and tools for adaptive pitch-shifting
with applications to intonation adjustment in a cappella recordings. To this end, we show how to incorporate time-varying information into existing pitch-shifting algorithms that are based on
resampling and time-scale modification (TSM). Furthermore, we
release an open-source Python toolbox, which includes a variety
of TSM algorithms and an implementation of our method. Finally,
we show the potential of our tools by two case studies on global
and local intonation adjustment in a cappella recordings using a
publicly available multitrack dataset of amateur choral singing.
Download The development of an online course in DSP eartraining The authors present a collaborative effort on establishing an online course in DSP eartraining. The paper reports from a preliminary workshop that covered a large range of topics such as eartraining in music education, terminology for sound characterization, e-learning, automated tutoring, DSP techniques, music examples and audio programming. An initial design of the web application is presented as a rich content database with flexible views to allow customized online presentations. Technical risks have already been mitigated through prototyping.