Download A System Based on Sinusoidal Analysis for the Estimation and Compensation of Pitch Variations in Musical Recordings
This paper presents a computationally efficient and easily interactive system for the estimation and compensation of speed variations in musical recordings. This class of degradation can be encountered in all types of analog recordings and is characterized by undesired pitch variations during the playback of the recording. We propose to estimate such variations in the digital counterpart of the analog recording by means of sinusoidal analysis, and these variations are corrected via non-uniform resampling. The system is evaluated for both artificially degraded and real audio recordings.
Download Gradient Conversion Between Time and Frequency Domains Using Wirtinger Calculus
Gradient-based optimizations are commonly found in areas where Fourier transforms are used, such as in audio signal processing. This paper presents a new method of converting any gradient of a cost function with respect to a signal into, or from, a gradient with respect to the spectrum of this signal: thus, it allows the gradient descent to be performed indiscriminately in time or frequency domain. For efficiency purposes, and because the gradient of a real function with respect to a complex signal does not formally exist, this work is performed using Wirtinger calculus. An application to sound texture synthesis then experimentally validates this gradient conversion.
Download Live Convolution with Time-variant Impulse Response
This paper describes a method for doing convolution of two live signals, without the need to load a time-invariant impulse response (IR) prior to the convolution process. The method is based on stepwise replacement of the IR in a continuously running convolution process. It was developed in the context of creative live electronic music performance, but can be applied to more traditional use cases for convolution as well. The process allows parametrization of the convolution parameters, by way of real-time transformations of the IR, and as such can be used to build parametric convolution effects for audio mixing and spatialization as well.
Download Modal Audio Effects: A Carillon Case Study
Modal representations—decomposing the resonances of objects into their vibrational modes has historically been a powerful tool for studying and synthesizing the sounds of physical objects, but it also provides a flexible framework for abstract sound synthesis. In this paper, we demonstrate a variety of musically relevant ways to modify the model upon resynthesis employing a carillon model as a case study. Using a set of audio recordings of the sixty bells of the Robert and Ann Lurie Carillon recorded at the University of Michigan, we present a modal analysis of these recordings, in which we decompose the sound of each bell into a sum of decaying sinusoids. Each sinusoid is characterized by a modal frequency, exponential decay rate, and initial complex amplitude. This analysis yields insight into the timbre of each individual bell as well as the entire carillon as an ensemble. It also yields a powerful parametric synthesis model for reproducing bell sounds and bell-based audio effects.
Download LP-BLIT: Bandlimited Impulse Train Synthesis of Lowpass-filtered Waveforms
Using bandlimited impulse train (BLIT) synthesis, it is possible to generate waveforms with a configurable number of harmonics with an equal amplitude. In contrast to the sinc-pulse, which is typically used for bandlimiting in BLIT and only allows to set the cutoff frequency, a Hammerich pulse can be tuned by two independent parameters for cutoff frequency and stop band roll-off. Replacing the perfect lowpass sinc-pulse in BLIT with a Hammerich pulse, it is possible to directly synthesise a multitude of signals with an adjustable lowpass spectrum.
Download Redressing Warped Wavelets and Other Similar Warped Time-something Representations
Time and frequency warping provide effective methods for fitting signal representations to desired physical or psychoacoustic characteristics. However, warping in one of the variables, e.g. frequency, disrupts the organization of the representation with respect to the conjugate variable, e.g. time. In recent papers we have considered methods to eliminate or mitigate the dispersion introduced by warping in time frequency representations and Gabor frames. To this purpose, we introduced redressing methods consisting in further warping with respect to the transformed variables. These methods proved not only useful for the visualization of the transform but also to simplify the computation of the transform in terms of shifted precomputed warped elements, without the need for warping in the computation of the transform. In other linear representations, such as time-scale, warping generally modifies the transform operators, making visualization less informative and computation more difficult. Sound signal representations almost invariably need time as one of the coordinates in view of the fact that we normally wish to follow the time evolution of features and characteristics. In this paper we devise methods for the redressing of dispersion introduced by warping in wavelet transforms and in other expansions where time-shift plays a role.
Download REDS: A New Asymmetric Atom for Sparse Audio Decomposition and Sound Synthesis
In this paper, we introduce a function designed specifically for sparse audio representations. A progression in the selection of dictionary elements (atoms) to sparsely represent audio has occurred: starting with symmetric atoms, then to damped sinusoid and hybrid atoms, and finally to the re-appropriation of the gammatone (GT) and formantwave-function (FOF) into atoms. These asymmetric atoms have already shown promise in sparse decomposition applications, where they prove to be highly correlated with natural sounds and musical audio, but since neither was originally designed for this application their utility remains limited. An in-depth comparison of each existing function was conducted based on application specific criteria. A directed design process was completed to create a new atom, the ramped exponentially damped sinusoid (REDS), that satisfies all desired properties: the REDS can adapt to a wide range of audio signal features and has good mathematical properties that enable efficient sparse decompositions and synthesis. Moreover, the REDS is proven to be approximately equal to the previous functions under some common conditions.
Download Harmonic-percussive Sound Separation Using Rhythmic Information from Non-negative Matrix Factorization in Single-channel Music Recordings
This paper proposes a novel method for separating harmonic and percussive sounds in single-channel music recordings. Standard non-negative matrix factorization (NMF) is used to obtain the activations of the most representative patterns active in the mixture. The basic idea is to classify automatically those activations that exhibit rhythmic and non-rhythmic patterns. We assume that percussive sounds are modeled by those activations that exhibit a rhythmic pattern. However, harmonic and vocal sounds are modeled by those activations that exhibit a less rhythmic pattern. The classification of the harmonic or percussive NMF activations is performed using a recursive process based on successive correlations applied to the activations. Specifically, promising results are obtained when a sound is classified as percussive through the identification of a set of peaks in the output of the fourth correlation. The reason is because harmonic sounds tend to be represented by one valley in a half-cycle waveform at the output of the fourth correlation. Evaluation shows that the proposed method provides competitive results compared to other reference state-of-the-art methods. Some audio examples are available to illustrate the separation performance of the proposed method.
Download Iterative Structured Shrinkage Algorithms for Stationary/Transient Audio Separation
In this paper, we present novel strategies for stationary/transient signal separation in audio signals in order to exploit the basic observation that stationary components are sparse in frequency and persistent over time whereas transients are sparse in time and persistent across frequency. We utilize a multi-resolution STFT approach which allows to define structured shrinkage operators to tune into the characteristic spectrotemporal shapes of the stationary and transient signal layers. Structure is incorporated by considering the energy of time-frequency neighbourhoods or modulation spectrum regions instead of individual STFT coefficients, and shrinkage operators are employed in a dual-layered Iterated Shrinkage/Thresholding Algorithm (ISTA) framework. We further propose a novel iterative scheme, Iterative Cross-Shrinkage (ICS). In experiments using artificial test signals, ICS clearly outperforms the dual-layered ISTA and yields particularly good results in conjunction with a dynamic update of the shrinkage thresholds. The application of the novel algorithms to recordings from acoustic musical instruments provides perceptually convincing separation of transients.
Download An Explorative String-bridge-plate Model with Tunable Parameters
The virtual exploration of the domain of mechano-acoustically produced sound and music is a long-held aspiration of physical modelling. A physics-based algorithm developed for this purpose combined with an interface can be referred to as a virtual-acoustic instrument; its design, formulation, implementation, and control are subject to a mix of technical and aesthetic criteria, including sonic complexity, versatility, modal accuracy, and computational efficiency. This paper reports on the development of one such system, based on simulating the vibrations of a string and a plate coupled via a (nonlinear) bridge element. Attention is given to formulating and implementing the numerical algorithm such that any of its parameters can be adjusted in real-time, thus facilitating musician-friendly exploration of the parameter space and offering novel possibilities regarding gestural control. Simulation results are presented exemplifying the sonic potential of the string-bridgeplate model (including bridge rattling and buzzing), and details regarding efficiency, real-time implementation and control interface development are discussed.