Download The PluckSynth touch string In this paper the problem of the synthesis of plucked strings by means of physically inspired models is reconsidered in the context of the player’s interaction with the virtual instrument. While solutions for the synthesis of guitar tones have been proposed, which are excellent from the acoustic point of view, the problem of the control of the physical parameters directly by the player has not received sufficient attention. In this paper we revive a simple model previously presented by Cuzzucoli and Lombardo for the player’s touch. We show that the model is affected by an inconsistency that can be removed by introducing the finger/pick perturbation in a balanced form on the digital waveguide. The results, together with a more comprehensive model of the guitar have been implemented in a VST plugin, which is the starting point for further research.
Download Delay-free audio coding based on ADPCM and error feedback Real-time bidirectional audio applications, like microphones and monitor speakers in live performances, typically require communication systems with minimum latency. When digital transmission with limited bit rate is desired, this poses tight constraints on the algorithmic delay of the audio coding scheme. We present a delay-free approach employing adaptive differential pulse code modulation (ADPCM) and adaptive spectral shaping of the coding noise. To achieve zero-delay operation, both prediction and quantization logic of the ADPCM structure are realized in a backwardadaptive fashion. Noise shaping is accomplished via two feedback loops around the quantizer for efficient exploitation of the auditory selectivity and masking phenomena, respectively. Due to automatic optimization of the involved parameters, the performance of the proposed system is on par with that of prior low-delay approaches.
Download Modulation and demodulation of steerable ultrasound beams for audio transmission and rendering Nonlinear effects in ultrasound propagation can be used for generating highly directive audible sound. In order to do so, we can modulate the amplitude of the audio signal and send it to an ultrasound transducer. When played back at a sufficiently high sound pressure level, due to a nonlinear behavior of the medium, the ultrasonic signal gets self-demodulated. The resulting signal has two important characteristics: that of becoming audible; and that of having the same directivity properties of the ultrasonic carrier frequency. In this paper we describe the theoretical advantages of singlesideband (SSB) modulation versus a standard amplitude modulation (AM) scheme for the above-described application. We describe our near-field soundfield measuring experiments, and propose steering solutions for the array using two different types of transducers, piezoelectric or electrostatic, and the proper supporting hardware.
Download Asymmetric-spectra methods for adaptive FM synthesis This article provides an overview of further methods for producing hybrid natural-synthetic spectra with adaptive frequency modulation (AdFM). It focuses on three different techniques for the generation of asymmetric spectra based on single-sideband FM, asymmetric FM and Split-sideband synthesis. The first two techniques are applied to the variable delay line implementation of AdFM, whereas the third is based on an extension of the heterodyne method. The article discusses the principles involved in each synthesis technique in good detail, providing one reference implementation for each. A number of examples are discussed, demonstrating the possibilities for a variety of digital audio effects applications.
Download On the window-disjoint-orthogonality of speech sources in reverberant humanoid scenarios Many speech source separation approaches are based on the assumption of orthogonality of speech sources in the time-frequency domain. The target speech source is demixed from the mixture by applying the ideal binary mask to the mixture. The time-frequency orthogonality of speech sources is investigated in detail only for anechoic and artificially mixed speech mixtures. This paper evaluates how the orthogonality of speech sources decreases when using a realistic reverberant humanoid recording setup and indicates strategies to enhance the separation capabilities of algorithms based on ideal binary masks under these conditions. It is shown that the SIR of the target source demixed from the mixture using the ideal binary mask decreases by approximately 3 dB for reverberation times of T60 = 0.6 s opposed to the anechoic scenario. For humanoid setups, the spatial distribution of the sources and the choice of the correct ear channel introduces differences in the SIR of further 3 dB, which leads to specific strategies to choose the best channel for demixing.
Download Analysis-and-manipulation approach to pitch and duration of musical instrument sounds without distorting timbral characteristics This paper presents an analysis-manipulation method that can generate musical instrument sounds with arbitrary pitches and durations from the sound of a given musical instrument (called seed) without distorting its timbral characteristics. Based on psychoacoustical knowledge of the auditory effects of timbres, we defined timbral features based on the spectrogram of the sound of a musical instrument as (i) the relative amplitudes of the harmonic peaks, (ii) the distribution of the inharmonic component, and (iii) temporal envelopes. First, to analyze the timbral features of a seed, it was separated into harmonic and inharmonic components using Itoyama’s integrated model. For pitch manipulation, we took into account the pitch-dependency of features (i) and (ii). We predicted the values of each feature by using a cubic polynomial that approximated the distribution of these features over pitches. To manipulate duration, we focused on preserving feature (iii) in the attack and decay duration of a seed. Therefore, only steady durations were expanded or shrunk. In addition, we propose a method for reproducing the properties of vibrato. Experimental results demonstrated the quality of the synthesized sounds produced using our method. The spectral and MFCC distances between the synthesized sounds and actual sounds of 32 instruments were reduced by 64.70% and 32.31%, respectively.
Download An amplitude- and frequency-modulation vocoder for audio signal processing The decomposition of audio signals into perceptually meaningful modulation components is highly desirable for the development of new audio effects on the one hand and as a building block for future efficient audio compression algorithms on the other hand. In the past, there has always been a distinction between parametric coding methods and waveform coding: While waveform coding methods scale easily up to transparency (provided the necessary bit rate is available), parametric coding schemes are subjected to the limitations of the underlying source models. Otherwise, parametric methods usually offer a wealth of manipulation possibilities which can be exploited for application of audio effects, while waveform coding is strictly limited to the best as possible reproduction of the original signal. The analysis/synthesis approach presented in this paper is an attempt to show a way to bridge this gap by enabling a seamless transition between both approaches.
Download Wide-band harmonic sinusoidal modeling In this paper we propose a method to estimate and transform harmonic components in wide-band conditions, out of a single period of the analyzed signal. This method allows estimating harmonic parameters with higher temporal resolution than typical Short Time Fourier Transform (STFT) based methods. We also discuss transformations and synthesis strategies in such context, focusing on the human voice.
Download Time mosaics - An image processing approach to audio visualization This paper presents a new approach to the visualization of monophonic audio files that simultaneously illustrates general audio properties and the component sounds that comprise a given input file. This approach represents sound clip sequences using archetypal images which are subjected to image processing filters driven by audio characteristics such as power, pitch and signalto-noise ratio. Where the audio is comprised of a single sound it is represented by a single image that has been subjected to filtering. Heterogeneous audio files are represented as a seamless image mosaic along a time axis where each component image in the mosaic maps directly to a discovered component sound. To support this, in a given audio file, the system separates individual sounds and reveals the overlapping period between sound clips. Compared with existing visualization methods such as oscilloscopes and spectrograms, this approach yields more accessible illustrations of audio files, which are suitable for casual and nonexpert users. We propose that this method could be used as an efficient means of scanning audio database queries and navigating audio databases through browsing, since the user can visually scan the file contents and audio properties simultaneously.
Download Generalization of the derivative analysis method to non-stationary sinusoidal modeling In the context of non-stationary sinusoidal modeling, this paper introduces the generalization of the derivative method (presented at the first DAFx edition) for the analysis stage. This new method is then compared to the reassignment method for the estimation of all the parameters of the model (phase, amplitude, frequency, amplitude modulation, and frequency modulation), and to the CramérRao bounds. It turns out that the new method is less biased, and thus outperforms the reassignment method in most cases for signalto-noise ratios greater than −10dB.