Download Nicht-negativeMatrixFaktorisierungnutzendes-KlangsynthesenSystem (NiMFKS): Extensions of NMF-based Concatenative Sound Synthesis Concatenative sound synthesis (CSS) entails synthesising a “target” sound with other sounds collected in a “corpus.” Recent work explores CSS using non-negative matrix factorisation (NMF) to approximate a target sonogram by the product of a corpus sonogram and an activation matrix. In this paper, we propose a number of extensions of NMF-based CSS and present an open MATLAB implementation in a GUI-based application we name NiMFKS. In particular we consider the following extensions: 1) we extend the NMF framework by implementing update rules based on the generalised β-divergence; 2) We add an optional monotonic algorithm for sparse-NMF; 3) we tackle the computational challenges of scaling to big corpora by implementing a corpus pruning preprocessing step; 4) we generalise constraints that may be applied to the activation matrix shape; and 5) we implement new modes of interacting with the procedure by enabling sketching and modifying of the activation matrix. Our application, NiMFKS and source code can be downloaded from here: https: //code.soundsoftware.ac.uk/projects/nimfks.
Download Validated Exponential Analysis for Harmonic Sounds In audio spectral analysis, the Fourier method is popular because of its stability and its low computational complexity. It suffers however from a time-frequency resolution trade off and is not particularly suited for aperiodic signals such as exponentially decaying ones. To overcome their resolution limitation, additional techniques such as quadratic peak interpolation or peak picking, and instantaneous frequency computation from phase unwrapping are used. Parameteric methods on the other hand, overcome the timefrequency trade off but are more susceptible to noise and have a higher computational complexity. We propose a method to overcome these drawbacks: we set up regularized smaller sized independent problems and perform a cluster analysis on their combined output. The new approach validates the true physical terms in the exponential model, is robust in the presence of outliers in the data and is able to filter out any non-physical noise terms in the model. The method is illustrated in the removal of electrical humming in harmonic sounds.
Download A System Based on Sinusoidal Analysis for the Estimation and Compensation of Pitch Variations in Musical Recordings This paper presents a computationally efficient and easily interactive system for the estimation and compensation of speed variations in musical recordings. This class of degradation can be encountered in all types of analog recordings and is characterized by undesired pitch variations during the playback of the recording. We propose to estimate such variations in the digital counterpart of the analog recording by means of sinusoidal analysis, and these variations are corrected via non-uniform resampling. The system is evaluated for both artificially degraded and real audio recordings.
Download Gradient Conversion Between Time and Frequency Domains Using Wirtinger Calculus Gradient-based optimizations are commonly found in areas where Fourier transforms are used, such as in audio signal processing. This paper presents a new method of converting any gradient of a cost function with respect to a signal into, or from, a gradient with respect to the spectrum of this signal: thus, it allows the gradient descent to be performed indiscriminately in time or frequency domain. For efficiency purposes, and because the gradient of a real function with respect to a complex signal does not formally exist, this work is performed using Wirtinger calculus. An application to sound texture synthesis then experimentally validates this gradient conversion.
Download Live Convolution with Time-variant Impulse Response This paper describes a method for doing convolution of two live signals, without the need to load a time-invariant impulse response (IR) prior to the convolution process. The method is based on stepwise replacement of the IR in a continuously running convolution process. It was developed in the context of creative live electronic music performance, but can be applied to more traditional use cases for convolution as well. The process allows parametrization of the convolution parameters, by way of real-time transformations of the IR, and as such can be used to build parametric convolution effects for audio mixing and spatialization as well.
Download Modal Audio Effects: A Carillon Case Study Modal representations—decomposing the resonances of objects into their vibrational modes has historically been a powerful tool for studying and synthesizing the sounds of physical objects, but it also provides a flexible framework for abstract sound synthesis. In this paper, we demonstrate a variety of musically relevant ways to modify the model upon resynthesis employing a carillon model as a case study. Using a set of audio recordings of the sixty bells of the Robert and Ann Lurie Carillon recorded at the University of Michigan, we present a modal analysis of these recordings, in which we decompose the sound of each bell into a sum of decaying sinusoids. Each sinusoid is characterized by a modal frequency, exponential decay rate, and initial complex amplitude. This analysis yields insight into the timbre of each individual bell as well as the entire carillon as an ensemble. It also yields a powerful parametric synthesis model for reproducing bell sounds and bell-based audio effects.
Download LP-BLIT: Bandlimited Impulse Train Synthesis of Lowpass-filtered Waveforms Using bandlimited impulse train (BLIT) synthesis, it is possible to generate waveforms with a configurable number of harmonics with an equal amplitude. In contrast to the sinc-pulse, which is typically used for bandlimiting in BLIT and only allows to set the cutoff frequency, a Hammerich pulse can be tuned by two independent parameters for cutoff frequency and stop band roll-off. Replacing the perfect lowpass sinc-pulse in BLIT with a Hammerich pulse, it is possible to directly synthesise a multitude of signals with an adjustable lowpass spectrum.
Download Redressing Warped Wavelets and Other Similar Warped Time-something Representations Time and frequency warping provide effective methods for fitting signal representations to desired physical or psychoacoustic characteristics. However, warping in one of the variables, e.g. frequency, disrupts the organization of the representation with respect to the conjugate variable, e.g. time. In recent papers we have considered methods to eliminate or mitigate the dispersion introduced by warping in time frequency representations and Gabor frames. To this purpose, we introduced redressing methods consisting in further warping with respect to the transformed variables. These methods proved not only useful for the visualization of the transform but also to simplify the computation of the transform in terms of shifted precomputed warped elements, without the need for warping in the computation of the transform. In other linear representations, such as time-scale, warping generally modifies the transform operators, making visualization less informative and computation more difficult. Sound signal representations almost invariably need time as one of the coordinates in view of the fact that we normally wish to follow the time evolution of features and characteristics. In this paper we devise methods for the redressing of dispersion introduced by warping in wavelet transforms and in other expansions where time-shift plays a role.
Download REDS: A New Asymmetric Atom for Sparse Audio Decomposition and Sound Synthesis In this paper, we introduce a function designed specifically for sparse audio representations. A progression in the selection of dictionary elements (atoms) to sparsely represent audio has occurred: starting with symmetric atoms, then to damped sinusoid and hybrid atoms, and finally to the re-appropriation of the gammatone (GT) and formantwave-function (FOF) into atoms. These asymmetric atoms have already shown promise in sparse decomposition applications, where they prove to be highly correlated with natural sounds and musical audio, but since neither was originally designed for this application their utility remains limited. An in-depth comparison of each existing function was conducted based on application specific criteria. A directed design process was completed to create a new atom, the ramped exponentially damped sinusoid (REDS), that satisfies all desired properties: the REDS can adapt to a wide range of audio signal features and has good mathematical properties that enable efficient sparse decompositions and synthesis. Moreover, the REDS is proven to be approximately equal to the previous functions under some common conditions.
Download Harmonic-percussive Sound Separation Using Rhythmic Information from Non-negative Matrix Factorization in Single-channel Music Recordings This paper proposes a novel method for separating harmonic and percussive sounds in single-channel music recordings. Standard non-negative matrix factorization (NMF) is used to obtain the activations of the most representative patterns active in the mixture. The basic idea is to classify automatically those activations that exhibit rhythmic and non-rhythmic patterns. We assume that percussive sounds are modeled by those activations that exhibit a rhythmic pattern. However, harmonic and vocal sounds are modeled by those activations that exhibit a less rhythmic pattern. The classification of the harmonic or percussive NMF activations is performed using a recursive process based on successive correlations applied to the activations. Specifically, promising results are obtained when a sound is classified as percussive through the identification of a set of peaks in the output of the fourth correlation. The reason is because harmonic sounds tend to be represented by one valley in a half-cycle waveform at the output of the fourth correlation. Evaluation shows that the proposed method provides competitive results compared to other reference state-of-the-art methods. Some audio examples are available to illustrate the separation performance of the proposed method.