Download Reducing the Aliasing of Nonlinear Waveshaping Using Continuous-Time Convolution Nonlinear waveshaping is a common technique in musical signal processing, both in a static memoryless context and within feedback systems. Such waveshaping is usually applied directly to a sampled signal, generating harmonics that exceed the Nyquist frequency and cause aliasing distortion. This problem is traditionally tackled by oversampling the system. In this paper, we present a novel method for reducing this aliasing by constructing a continuous-time approximation of the discrete-time signal, applying the nonlinearity to it, and filtering in continuous-time using analytically applied convolution. The presented technique markedly reduces aliasing distortion, especially in combination with low order oversampling. The approach is also extended to allow it to be used within a feedback system.
Download On Vibrato and Frequency (De)Modulation in Musical Sounds Vibrato is an important characteristic in human musical performance and is often uniquely characteristic to a player and/or a particular instrument. This work is motivated by the assumption (often made in the source separation literature) that vibrato aids in the identification of multiple sound sources playing in unison. It follows that its removal, the focus herein, may contribute to a more blended combination. In signals, vibrato is often modeled as an oscillatory deviation from a center pitch/frequency that presents in the sound as phase/frequency modulation. While vibrato implementation using a time-varying delay line is well known, using a delay line for its removal is less so. In this work we focus on (de)modulation of vibrato in a signal by first showing the relationship between modulation and corresponding demodulation delay functions and then suggest a solution for increased vibrato removal in the latter by ensuring sideband attenuation below the threshold of audibility. Two known methods for estimating the instantaneous frequency/phase are used to construct delay functions from both contrived and musical examples so that vibrato removal may be evaluated.
Download Analysis/Synthesis Using Time-Varying Windows and Chirped Atoms A common assumption that is often made regarding audio signals is that they are short-term stationary. In other words, it is typically assumed that the statistical properties of audio signals change slowly enough that they can be considered nearly constant over a short interval. However, using a fixed analysis window (which is typical in practice) we have no way to change the analysis parameters over time in order to track the slowly evolving properties of the audio signal. For example, while a long window may be appropriate for analyzing tonal phenomena it will smear subsequent note onsets. Furthermore, the audio signal may not be completely stationary over the duration of the analysis window. This is often true of sounds containing glissando, vibrato, and other transient phenomena. In this paper we build upon previous work targeted at non-stationary analysis/synthesis. In particular, we discuss how to simultaneously adapt the window length and the chirp rate of the analysis frame in order to maximally concentrate the spectral energy. This is done by a) finding the analysis window that leads to the minimum entropy spectrum; and, b) estimating the chirp rate using the distribution derivative method. We also discuss a fast method of analysis/synthesis using the fan-chirp transform and overlap-add. Finally, we analyze several real and synthetic signals and show a qualitative improvement in the spectral energy concentration.
Download Parametric Coding of Stereo Audio Based on Principal Component Analysis Low bit rate parametric coding of multichannel audio is mainly based on Binaural Cue Coding (BCC). Another multichannel audio processing method called upmix can also be used to deliver multichannel audio, typically 5.1 signals, at low data rates. More precisely, we focus on existing upmix method based on Principal Component Analysis (PCA). This PCA-based upmix method aims at blindly create a realistic multichannel output signal while BCC scheme aims at perceptually restitute the original multichannel audio signal. PCA-based upmix method and BCC scheme both use spatial parameters extracted from stereo channels to generate auditory events with correct spatial attributes i.e. sound sources positions and spatial impression. In this paper, we expose a multichannel audio model based on PCA which allows a parametric representation of multichannel audio. Considering stereo audio, signals resulting from PCA can be represented as a principal component, corresponding to directional sources, and one remaining signal, corresponding to ambience signals, which are both related to original input with PCA transformation parameters. We apply the analysis results to propose a new parametric coding method of stereo audio based on subband PCA processing. The quantization of spatial and energetic parameters is presented and then associated with a state-of-the-art monophonic coder in order to derive subjective listening test results.
Download Fan Chirp Transformation for Music Representation In this work the Fan Chirp Transform (FChT), which provides an acute representation of harmonically related linear chirp signals, is applied to the analysis of pitch content in polyphonic music. The implementation introduced was devised to be computationally manageable and enables the generalization of the FChT for the analysis of non-linear chirps. The combination with the Constant Q Transform is explored to build a multi-resolution FChT. An existing method to compute pitch salience from the FChT is improved and adapted to handle polyphonic music. In this way a useful melodic content visualization tool is obtained. The results of a frame based melody detection evaluation indicate that the introduced technique is very promising as a front-end for music analysis.
Download One-to-Many Conversion for Percussive Samples A filtering algorithm for generating subtle random variations in
sampled sounds is proposed. Using only one recording for impact
sound effects or drum machine sounds results in unrealistic repetitiveness during consecutive playback. This paper studies spectral
variations in repeated knocking sounds and in three drum sounds:
a hihat, a snare, and a tomtom. The proposed method uses a short
pseudo-random velvet-noise filter and a low-shelf filter to produce
timbral variations targeted at appropriate spectral regions, yielding potentially an endless number of new realistic versions of a
single percussive sampled sound.
The realism of the resulting
processed sounds is studied in a listening test. The results show
that the sound quality obtained with the proposed algorithm is at
least as good as that of a previous method while using 77% fewer
computational operations. The algorithm is widely applicable to
computer-generated music and game audio.
Download Real-time time-varying frequency warping via short-time Laguerre transform In this paper we address the problem of the real-time implementation of time-varying frequency warping. Frequency warping based on a one-parameter family of one-to-one warping maps can be realized by means of the Laguerre transform and implemented in a non-causal structure. This structure is not directly suited for real-time implementation since each output sample is formed by combining all of the input samples. Similarly, the recently proposed time-varying Laguerre transform has the same drawback. Furthermore, long frequency dependent delays destroy the time organization or macrostructure of the sound event. Recently, the author has introduced the Short-Time Laguerre Transform for the approximate real-time implementation of frequency warping. In this transform the short-time spectrum rather than the overall frequency spectrum is frequency warped. The input is subdivided into frames that are tapered by a suitably selected window. By careful design, the output frames correspond to warped versions of the input frames modulated by a stretched version of the window. It is then possible to overlap-add these frames without introducing audible distortion. The overlap-add technique can be generalized to time-varying warping. However, several issues concerning the design of the window and the selection of the overlap parameters need to be addressed. In this paper we discuss solutions for the overlap of the frames when the Laguerre parameter is kept constant but distinct in each frame and solutions for the computation of full time-varying frequency warping when the Laguerre parameter is changing within each frame.
Download Parametric Coding of Spatial Audio Recently, there has been a renewed interest in techniques for coding of stereo and multi-channel audio signals. Stereo and multichannel audio signals evoke an auditory spatial image in a listener. Thus, in addition to pure redundancy reduction, a receiver model which considers properties of spatial hearing may be used for reducing the bitrate. This has been done in previous techniques by considering the importance of interaural level difference cues at high frequencies and by considering the binaural masking level difference when computing the masked threshold for multiple audio channels. Recently, a number of more systematic and parameterized such techniques were introduced. In this paper an overview over a technique, denoted binaural cue coding (BCC), is given. BCC represents stereo or multichannel audio signals as a single or more downmixed audio channels plus side information. The side information contains the interchannel cues inherent in the original audio signal that are relevant for the perception of the properties of the auditory spatial image. The relation between the inter-channel cues and attributes of the auditory spatial image is discussed. Other applications of BCC are discussed, such as joint-coding of independent audio signals providing flexibility at the decoder to mix arbitrary stereo, multichannel, and binaural signals.
Download Self-Authentication of Audio signals by Chirp Coding This paper discusses a new approach to ‘watermarking’ digital signals using linear frequency modulated or ‘chirp’ coding. The principles underlying this approach are based on the use of a matched filter to provide a reconstruction of a chirped code that is uniquely robust in the case of signals with very low signal-to-noise ratios. Chirp coding for authenticating data is generic in the sense that it can be used for a range of data types and applications (the authentication of speech and audio signals, for example). The theoretical and computational aspects of the matched filter and the properties of a chirp are revisited to provide the essential background to the method. Signal code generating schemes are then addressed and details of the coding and decoding techniques considered. Finally, the paper briefly describes an example application which is available on-line for readers who are interested in using the approach for audio data authentication working with either WAV or MP3 files.
Download Relative auditory distance discrimination with virtual nearby sound sources In this paper a psychophysical experiment targeted at exploring relative distance discrimination thresholds with binaurally rendered virtual sound sources in the near field is described. Pairs of virtual sources are spatialized around 6 different spatial locations (2 directions ×3 reference distances) through a set of generic far-field Head-Related Transfer Functions (HRTFs) coupled with a nearfield correction model proposed in the literature, known as DVF (Distance Variation Function). Individual discrimination thresholds for each spatial location and for each of the two orders of presentation of stimuli (approaching or receding) are calculated on 20 subjects through an adaptive procedure. Results show that thresholds are higher than those reported in the literature for real sound sources, and that approaching and receding stimuli behave differently. In particular, when the virtual source is close (< 25 cm) thresholds for the approaching condition are significantly lower compared to thresholds for the receding condition, while the opposite behaviour appears for greater distances (≈ 1 m). We hypothesize such an asymmetric bias to be due to variations in the absolute stimulus level.