Download A Diffusion-Based Generative Equalizer for Music Restoration
This paper presents a novel approach to audio restoration, focusing on the enhancement of low-quality music recordings, and in particular historical ones. Building upon a previous algorithm called BABE, or Blind Audio Bandwidth Extension, we introduce BABE-2, which presents a series of improvements. This research broadens the concept of bandwidth extension to generative equalization, a task that, to the best of our knowledge, has not been previously addressed for music restoration. BABE-2 is built around an optimization algorithm utilizing priors from diffusion models, which are trained or fine-tuned using a curated set of high-quality music tracks. The algorithm simultaneously performs two critical tasks: estimation of the filter degradation magnitude response and hallucination of the restored audio. The proposed method is objectively evaluated on historical piano recordings, showing an enhancement over the prior version. The method yields similarly impressive results in rejuvenating the works of renowned vocalists Enrico Caruso and Nellie Melba. This research represents an advancement in the practical restoration of historical music. Historical music restoration examples are available at: research.spa.aalto.fi/publications/papers/dafx-babe2/.
Download Realistic Gramophone Noise Synthesis Using a Diffusion Model
This paper introduces a novel data-driven strategy for synthesizing gramophone noise audio textures. A diffusion probabilistic model is applied to generate highly realistic quasiperiodic noises. The proposed model is designed to generate samples of length equal to one disk revolution, but a method to generate plausible periodic variations between revolutions is also proposed. A guided approach is also applied as a conditioning method, where an audio signal generated with manually-tuned signal processing is refined via reverse diffusion to improve realism. The method has been evaluated in a subjective listening test, in which the participants were often unable to recognize the synthesized signals from the real ones. The synthetic noises produced with the best proposed unconditional method are statistically indistinguishable from real noise recordings. This work shows the potential of diffusion models for highly realistic audio synthesis tasks.
Download Sample Rate Independent Recurrent Neural Networks for Audio Effects Processing
In recent years, machine learning approaches to modelling guitar amplifiers and effects pedals have been widely investigated and have become standard practice in some consumer products. In particular, recurrent neural networks (RNNs) are a popular choice for modelling non-linear devices such as vacuum tube amplifiers and distortion circuitry. One limitation of such models is that they are trained on audio at a specific sample rate and therefore give unreliable results when operating at another rate. Here, we investigate several methods of modifying RNN structures to make them approximately sample rate independent, with a focus on oversampling. In the case of integer oversampling, we demonstrate that a previously proposed delay-based approach provides high fidelity sample rate conversion whilst additionally reducing aliasing. For non-integer sample rate adjustment, we propose two novel methods and show that one of these, based on cubic Lagrange interpolation of a delay-line, provides a significant improvement over existing methods. To our knowledge, this work provides the first in-depth study into this problem.
Download Virtual Bass System With Fuzzy Separation of Tones and Transients
A virtual bass system creates an impression of bass perception in sound systems with weak low-frequency reproduction, which is typical of small loudspeakers. Virtual bass systems extend the bandwidth of the low-frequency audio content using either a nonlinear function or a phase vocoder, and add the processed signal to the reproduced sound. Hybrid systems separate transients and steady-state sounds, which are processed separately. It is still challenging to reach a good sound quality using a virtual bass system. This paper proposes a novel method, which separates the tonal, transient, and noisy parts of the audio signal in a fuzzy way, and then processes only the transients and tones. Those upper harmonics, which can be detected above the cutoff frequency, are boosted using timbre-matched weights, but missing upper harmonics are generated to assist the missing fundamental phenomenon. Listening test results show that the proposed algorithm outperforms selected previous methods in terms of perceived bass sound quality. The proposed method can enhance the bass sound perception of small loudspeakers, such as those used in laptop computers and mobile devices.
Download A pickup model for the Clavinet
In this paper recent findings on magnetic transducers are applied to the analysis and modeling of Clavinet pickups. The Clavinet is a stringed instrument having similarities to the electric guitar, it has magnetic single coil pickups used to transduce the string vibration to an electrical quantity. Data gathered during physical inspection and electrical measurements are used to build a complete model which accounts for nonlinearities in the magnetic flux. The model is inserted in a Digital Waveguide (DWG) model for the Clavinet string for its evaluation.
Download Examining the Oscillator Waveform Animation Effect
An enhancing effect that can be applied to analogue oscillators in subtractive synthesizers is termed Animation, which is an efficient way to create a sound of many closely detuned oscillators playing in unison. This is often referred to as a supersaw oscillator. This paper first explains the operating principle of this effect using a combination of additive and frequency modulation synthesis. The Fourier series will be derived and results will be presented to demonstrate its accuracy. This will then provide new insights into how other more general waveform animation processors can be designed.
Download Sitrano: A Matlab App for Sines-Transients-Noise Decomposition of Audio Signals
Decomposition of sounds into their sinusoidal, transient, and noise components is an active research topic and a widely-used tool in audio processing. Multiple solutions have been proposed in recent years, using time–frequency representations to identify either horizontal and vertical structures or orientations and anisotropy in the spectrogram of the sound. In this paper, we present SiTraNo: an easy-to-use MATLAB application with a graphic user interface for audio decomposition that enables visualization and access to the sinusoidal, transient, and noise classes, individually. This application allows the user to choose between different well-known separation methods to analyze an input sound file, to instantaneously control and remix its spectral components, and to visually check the quality of the separation, before producing the desired output file. The visualization of common artifacts, such as birdies and dropouts, is demonstrated. This application promotes experimenting with the sound decomposition process by observing the effect of variations for each spectral component on the original sound and by comparing different methods against each other, evaluating the separation quality both audibly and visually. SiTraNo and its source code are available on a companion website and repository.
Download One-to-Many Conversion for Percussive Samples
A filtering algorithm for generating subtle random variations in sampled sounds is proposed. Using only one recording for impact sound effects or drum machine sounds results in unrealistic repetitiveness during consecutive playback. This paper studies spectral variations in repeated knocking sounds and in three drum sounds: a hihat, a snare, and a tomtom. The proposed method uses a short pseudo-random velvet-noise filter and a low-shelf filter to produce timbral variations targeted at appropriate spectral regions, yielding potentially an endless number of new realistic versions of a single percussive sampled sound. The realism of the resulting processed sounds is studied in a listening test. The results show that the sound quality obtained with the proposed algorithm is at least as good as that of a previous method while using 77% fewer computational operations. The algorithm is widely applicable to computer-generated music and game audio.
Download Granular analysis/synthesis of percussive drilling sounds
This paper deals with the automatic and robust analysis, and the realistic and low-cost synthesis of percussive drilling like sounds. The two contributions are: a non-supervised removal of quasistationary background noise based on the Non-negative Matrix Factorization, and a granular method for analysis/synthesis of this drilling sounds. These two points are appropriate to the acoustical properties of percussive drilling sounds, and can be extended to other sounds with similar characteristics. The context of this work is the training of operators of working machines using simulators. Additionally, an implementation is explained.
Download Differentiable Feedback Delay Network for Colorless Reverberation
Artificial reverberation algorithms often suffer from spectral coloration, usually in the form of metallic ringing, which impairs the perceived quality of sound. This paper proposes a method to reduce the coloration in the feedback delay network (FDN), a popular artificial reverberation algorithm. An optimization framework is employed entailing a differentiable FDN to learn a set of parameters decreasing coloration. The optimization objective is to minimize the spectral loss to obtain a flat magnitude response, with an additional temporal loss term to control the sparseness of the impulse response. The objective evaluation of the method shows a favorable narrower distribution of modal excitation while retaining the impulse response density. The subjective evaluation demonstrates that the proposed method lowers perceptual coloration of late reverberation, and also shows that the suggested optimization improves sound quality for small FDN sizes. The method proposed in this work constitutes an improvement in the design of accurate and high-quality artificial reverberation, simultaneously offering computational savings.