Download Time-domain model of the singing voice
A combined physical model for the human vocal folds and vocal tract is presented. The vocal fold model is based on a symmetrical 16 mass model by Titze. Each vocal fold is modeled with 8 masses that represent the mucosal membrane coupled by non-linear springs to another 8 masses for the vocalis muscle together with the ligament. Iteratively, the value of the glottal flow is calculated and taken as input for calculation of the aerodynamic forces. Together with the spring forces and damping forces they yield the new positions of the masses that are then used for the calculation of a new glottal flow value. The vocal tract model consists of a number of uniform cylinders of fixed length. At each discontinuity incident, reflected and transmitted waves are calculated including damping. Assuming a linear system, the pressure signal generated by the vocal fold model is either convoluted with the Green’s function calculated by the vocal tract model or calculated interactively assuming variable reflection coefficients for the glottis and the vocal tract during phonation. The algorithms aim at real-time performance and are implemented in MATLAB.
Download Hard real-time onset detection of percussive instruments
To date, the most successful onset detectors are those based on frequency representation of the signal. However, for such methods the time between the physical onset and the reported one is unpredictable and may largely vary according to the type of sound being analyzed. Such variability and unpredictability of spectrum-based onset detectors may not be convenient in some real-time applications. This paper proposes a real-time method to improve the temporal accuracy of state-of-the-art onset detectors. The method is grounded on the theory of hard real-time operating systems where the result of a task must be reported at a certain deadline. It consists of the combination of a time-base technique (which has a high degree of accuracy in detecting the physical onset time but is more prone to false positives and false negatives) with a spectrum-based technique (which has a high detection accuracy but a low temporal accuracy). The developed hard real-time onset detector was tested on a dataset of single non-pitched percussive sounds using the high frequency content detector as spectral technique. Experimental validation showed that the proposed approach was effective in better retrieving the physical onset time of about 50% of the hits detected by the spectral technique, with an average improvement of about 3 ms and maximum one of about 12 ms. The results also revealed that the use of a longer deadline may capture better the variability of the spectral technique, but at the cost of a bigger latency.
Download Towards Inverse Virtual Analog Modeling
Several digital signal processing approaches, generally referred to as Virtual Analog (VA) modeling, are currently under development for the software emulation of analog audio circuitry. The main purpose of VA modeling is to faithfully reproduce the behavior of real-world audio gear, e.g., distortion effects, synthesizers or amplifiers, using efficient algorithms. In this paper, however, we provide a preliminary discussion about how VA modeling can be exploited to infer the input signal of an analog audio system, given the output signal and the parameters of the circuit. In particular, we show how an inversion theorem known in circuit theory, and based on nullors, can be used for this purpose. As recent advances in Wave Digital Filter (WDF) theory allow us to implement circuits with nullors in a systematic fashion, WDFs prove to be useful tools for inverse VA modeling. WDF realizations of a nonlinear audio system and its inverse are presented as an example of application.
Download Non-Iterative Schemes for the Simulation of Nonlinear Audio Circuits
In this work, a number of numerical schemes are presented in the context of virtual-analog simulation. The schemes are linearlyimplicit in character, and hence directly solvable without iterative methods. Schemes of increasing order of accuracy are constructed, and convergence and stability conditions are proven formally. The schemes are able to handle stiff problems very efficiently, because of their fast update, and can be run at higher sample rates to reduce aliasing. The cases of the diode clipper and ring modulator are investigated in detail, including several numerical examples.
Download Room Acoustic Modelling Using a Hybrid Ray-Tracing/Feedback Delay Network Method
Combining different room acoustic modelling methods could provide a better balance between perceptual plausibility and computational efficiency than using a single and potentially more computationally expensive model. In this work, a hybrid acoustic modelling system that integrates ray tracing (RT) with an advanced feedback delay network (FDN) is designed to generate perceptually plausible RIRs. A multiple stimuli with hidden reference and anchor (MUSHRA) test and a two-alternative-forced-choice (2AFC) discrimination task have been conducted to compare the proposed method against ground truth recordings and conventional RT-based approaches. The results show that the proposed system delivers robust performance in various scenarios, achieving highly plausible reverberation synthesis.
Download Zero-Phase Sound via Giant FFT
Given the speedy computation of the FFT in current computer hardware, there are new possibilities for examining transformations for very long sounds. A zero-phase version of any audio signal can be obtained by zeroing the phase angle of its complex spectrum and taking the inverse FFT. This paper recommends additional processing steps, including zero-padding, transient suppression at the signal’s start and end, and gain compensation, to enhance the resulting sound quality. As a result, a sound with the same spectral characteristics as the original one, but with different temporal events, is obtained. Repeating rhythm patterns are retained, however. Zero-phase sounds are palindromic in the sense that they are symmetric in time. A comparison of the zero-phase conversion to the autocorrelation function helps to understand its properties, such as why the rhythm of the original sound is emphasized. It is also argued that the zero-phase signal has the same autocorrelation function as the original sound. One exciting variation of the method is to apply the method separately to the real and imaginary parts of the spectrum to produce a stereo effect. A frame-based technique enables the use of the zero-phase conversion in real-time audio processing. The zero-phase conversion is another member of the giant FFT toolset, allowing the modification of sampled sounds, such as drum loops or entire songs.
Download Sound Transformations Based on the SMS High Level Attributes
The basic Spectral Modeling Synthesis (SMS) technique models sounds as the sum of sinusoids plus a residual. Though this analysis/synthesis system has proved to be successful in transforming sounds, more powerful and intuitive musical transformations can be achieved by moving into the SMS high-level attribute plane. In this paper we describe how to extract high level sound attributes from the basic representation, modify them, and add them back before the synthesis stage. In this process new problems come up for which we propose some initial solutions.
Download The Sounds of the Avian Syrinx - are they Really Flute-Like?
This research presents a model of the avian vocal tract, implemented using classical waveguide synthesis and numerical methods. The vocal organ of the songbird, the syrinx, has a unique topography of acoustic tubes (a trachea with a bifurcation at its base) making it a rather unique subject for waveguide synthesis. In the upper region of the two bifid bronchi lies a nonlinear vibrating membrane – the primary resonator in sound production. Unlike most reed musical instruments, the more significant displacement of the membrane is perpendicular to the directions of airflow, due to the Bernoulli effect. The model of the membrane displacement, and the resulting pressure through the constriction created by the membrane motion, is therefore derived beginning with the Bernoulli equation.
Download Symbolic and audio processing to change the expressive intention of a recorded music performance
A framework for real-time expressive modification of audio musical performances is presented. An expressiveness model compute the deviations of the musical parameters which are relevant in terms of control of the expressive intention. The modifications are then realized by the integration of the model with a sound processing engine.
Download Realtime Multiple-Pitch and Multiple-Instrument Recognition for Music Signals Using Sparse Non-Negative Constraints
In this paper we introduce a simple and fast method for realtime recognition of multiple pitches produced by multiple musical instruments. Our proposed method is based on two important facts: (1) that timbral information of any instrument is pitch-dependant and (2) that the modulation spectrum of the same pitch seems to result into a persistent representation of the characteristics of the instrumental family. Using these basic facts, we construct a learning algorithm to obtain pitch templates of all possible notes on various instruments and then devise an online algorithm to decompose a realtime audio buffer using the learned templates. The learning and decomposition proposed here are inspired by non-negative matrix factorization methods but differ by introduction of an explicit sparsity control. Our test results show promising recognition rates for a realtime system on real music recordings. We discuss further improvements that can be made over the proposed system.