Download Emulating Rough and Growl Voice in Spectral Domain This paper presents a new approach on transforming a modal voice into a rough or growl voice. The goal of such transformations is to be able to enhance voice expressiveness in singing voice productions. Both techniques work with spectral models and are based on adding sub-harmonics in frequency domain to the original input voice spectrum.
Download CA: A system for Granular Processing of Sound using Cellular Automata is a tool for the granular processing of sound using cellular automata developed on the SGI-Indy platform. It investigates the effects of change in the timbre of sound using a cellular automaton in real-time. The cellular automaton generated by the chosen rule controls parameters of a bank of filters. The system uses standard infinite impulse response filters and a general model of three neighborhood cellular automata. The composer1 can configure the filter banks by adjusting bandwidths and center frequencies through the graphical interface. CA is very well suited as a tool for computer music composition because it is capable of creating a new palette of sounds for the composer and it is easy to use.
Download Multiresolution Sinusoidal/Stochastic Model For Voiced-Sounds The goal of this paper is to introduce a complete analysis/resynthesis method for the stationary part of voiced-sounds. The method is based on a new class of wavelets, the Harmonic-Band Wavelets (HBWT). Wavelets have been widely employed in signal processing [1, 2]. In the context of sound processing they provided very interesting results in their first harmonic version: the Pitch Synchronous Wavelets Transform (PSWT) [3]. We introduced the Harmonic-Band Wavelets in a previous edition of the DAFx [4]. The HBWT, with respect to the PSWT allows one to manipulate the analysis coefficients of each harmonic independently. Furthermore one is able to group the analysis coefficients according to a finer subdivision of the spectrum of each harmonic, due to the multiresolution analysis of the wavelets. This allows one to separate the deterministic components of voiced sounds, corresponding to the harmonic peaks, from the noisy/stochastic components. A first result was the development of a parametric representation of the HBWT analysis coefficients corresponding to the stochastic components [5, 7]. In this paper we present the results concerning a parametric representation of the HBWT analysis coefficients of the deterministic components. The method recalls the sinusoidal models, where one models time-varying amplitudes and time varying phases [8, 9]. This method provides a new interesting technique for sound synthesis and sound processing, integrating a parametric representation of both the deterministic and the stochastic components of sounds. At the same time it can be seen as a tool for a parametric representation of sound and data compression.
Download Simplifying Antiderivative Antialiasing with Lookup Table Integration Antiderivative Antialiasing (ADAA), has become a pivotal method
for reducing aliasing when dealing with nonlinear function at audio rate. However, its implementation requires analytical computation of the antiderivative of the nonlinear function, which in practical cases can be challenging without a symbolic solver. Moreover, when the nonlinear function is given by measurements it
must be approximated to get a symbolic description. In this paper, we propose a simple approach to ADAA for practical applications that employs numerical integration of lookup tables (LUTs)
to approximate the antiderivative. This method eliminates the need
for closed-form solutions, streamlining the ADAA implementation
process in industrial applications. We analyze the trade-offs of this
approach, highlighting its computational efficiency and ease of implementation while discussing the potential impact of numerical
integration errors on aliasing performance. Experiments are conducted with static nonlinearities (tanh, a simple wavefolder and
the Buchla 259 wavefolding circuit) and a stateful nonlinear system (the diode clipper).
Download Generalizations of Velvet Noise and their Use in 1-Bit Music A family of spectrally-flat noise sequences called “Velvet Noise” have found use in reverb modeling, decorrelation, speech synthesis, and abstract sound synthesis. These noise sequences are ternary—they consist of only the values −1, 0, and +1. They are also sparse in time, with pulse density being their main design parameter, and at typical audio sampling rates need only several thousand non-zero samples per second to sound “smooth.” This paper proposes “Crushed Velvet Noise” (CVN) generalizations to the classic family of Velvet Noise sequences including “Original Velvet Noise” (OVN), “Additive Random Noise” (ARN), and “Totally Random Noise” (TRN). In these generalizations, the probability of getting a positive or negative impulse is a free parameter. Manipulating this probability gives Crushed OVN and ARN low-shelf spectra rather than the flat spectra of standard Velvet Noise, while the spectrum of Crushed TRN is still flat. This new family of noise sequences is still ternary and sparse in time. However, pulse density now controls the shelf cutoff frequency, and the distribution of polarities controls the shelf depth. Crushed Velvet Noise sequences with pulses of only a single polarity are particularly useful in a niche style of music called “1- bit music”: music with a binary waveform consisting of only 0s and 1s. We propose Crushed Velvet Noise as a valuable tool in 1- bit music composition, where its sparsity allows for good approximations to operations, such as addition, which are impossible for signals in general in the 1-bit domain.
Download Informed Source Separation for Stereo Unmixing — An Open Source Implementation Active listening consists in interacting with the music playing and has numerous potential applications from pedagogy to gaming, through creation. In the context of music industry, using existing musical recordings (e.g. studio stems), it could be possible for the listener to generate new versions of a given musical piece (i.e. artistic mix). But imagine one could do this from the original mix itself. In a previous research project, we proposed a coder / decoder scheme for what we called informed source separation: The coder determines the information necessary to recover the tracks and embeds it inaudibly (using watermarking) in the mix. The decoder enhances the source separation with this information. We proposed and patented several methods, using various types of embedded information and separation techniques, hoping that the music industry was ready to give the listener this freedom of active listening. Fortunately, there are numerous other applications possible, such as the manipulation of musical archives, for example in the context of ethnomusicology. But the patents remain for many years, which is problematic. In this article, we present an open-source implementation of a patent-free algorithm to address the mixing and unmixing audio problem for any type of music.
Download Exposure Bias and State Matching in Recurrent Neural Network Virtual Analog Models Virtual analog (VA) modeling using neural networks (NNs) has
great potential for rapidly producing high-fidelity models. Recurrent neural networks (RNNs) are especially appealing for VA due
to their connection with discrete nodal analysis. Furthermore, VA
models based on NNs can be trained efficiently by directly exposing them to the circuit states in a gray-box fashion. However,
exposure to ground truth information during training can leave the
models susceptible to error accumulation in a free-running mode,
also known as “exposure bias” in machine learning literature. This
paper presents a unified framework for treating the previously
proposed state trajectory network (STN) and gated recurrent unit
(GRU) networks as special cases of discrete nodal analysis. We
propose a novel circuit state-matching mechanism for the GRU
and experimentally compare the previously mentioned networks
for their performance in state matching, during training, and in exposure bias, during inference. Experimental results from modeling
a diode clipper show that all the tested models exhibit some exposure bias, which can be mitigated by truncated backpropagation
through time. Furthermore, the proposed state matching mechanism improves the GRU modeling performance of an overdrive
pedal and a phaser pedal, especially in the presence of external
modulation, apparent in a phaser circuit.
Download Score level timbre transformations of violin sounds The ability of a sound synthesizer to provide realistic sounds depends to a great extent on the availability of expressive controls. One of the most important expressive features a user of the synthesizer would desire to have control of, is timbre. Timbre is a complex concept related to many musical indications in a score such as dynamics, accents, hand position, string played, or even indications referring timbre itself. Musical indications are in turn related to low level performance controls such as bow velocity or bow force. With the help of a data acquisition system able to record sound synchronized to performance controls and aligned to the performed score and by means of statistical analysis, we are able to model the interrelations among sound (timbre), controls and musical score indications. In this paper we present a procedure for score-controlled timbre transformations of violin sounds within a sample based synthesizer. Given a sound sample and its trajectory of performance controls: 1) a transformation of the controls trajectory is carried out according to the score indications, 2) a new timbre corresponding to the transformed trajectory is predicted by means of a timbre model that relates timbre with performance controls and 3) the timbre of the original sound is transformed by applying a timevarying filter calculated frame by frame as the difference of the original and predicted envelopes.
Download Modal Analysis Of Room Impulse Responses Using Subband Esprit This paper describes a modification of the ESPRIT algorithm which can be used to determine the parameters (frequency, decay time, initial magnitude and initial phase) of a modal reverberator that best match a provided room impulse response. By applying perceptual criteria we are able to match room impulse responses using a variable number of modes, with an emphasis on high quality for lower mode counts; this allows the synthesis algorithm to scale to different computational environments. A hybrid FIR/modal reverb architecture is also presented which allows for the efficient modeling of room impulse responses that contain sparse early reflections and dense late reverb. MUSHRA tests comparing the analysis/synthesis using various mode numbers for our algorithms, and for another state of the art algorithm, are included as well.
Download MorphDrive: Latent Conditioning for Cross-Circuit Effect Modeling and a Parametric Audio Dataset of Analog Overdrive Pedals In this paper, we present an approach to the neural modeling of
overdrive guitar pedals with conditioning from a cross-circuit and
cross-setting latent space. The resulting network models the behavior of multiple overdrive pedals across different settings, offering continuous morphing between real configurations and hybrid
behaviors. Compact conditioning spaces are obtained through unsupervised training of a variational autoencoder with adversarial
training, resulting in accurate reconstruction performance across
different sets of pedals. We then compare three Hyper-Recurrent
architectures for processing, including dynamic and static HyperRNNs, and a smaller model for real-time processing. Additionally,
we present pOD-set, a new open dataset including recordings of
27 analog overdrive pedals, each with 36 gain and tone parameter combinations totaling over 97 hours of recordings. Precise parameter setting was achieved through a custom-deployed recording
robot.