Download Variable Pre-Emphasis LPC for Modeling Vocal Effort in the Singing Voice
In speech and singing, the spectral envelope of the glottal source varies according to different voice qualities such as vocal effort, lax voice, and breathy voice. In contrast, linear prediction coding (LPC) models the glottal source in a way that is not flexible. The spectral envelope of the source estimated by LPC is fixed and determined by the pre-emphasis filter. In standard LPC, the formant filter captures variation in the spectral envelope that should be associated with the source. This paper presents variable preemphasis LPC (VPLPC) as a technique to allow the estimated source to vary. This results in formant filters that remain more consistent across variations in vocal effort and breathiness. VPLPC also provides a way to change the envelope of the estimated source, thereby changing the perception of vocal effort. The VPLPC algorithm is used to manipulate some voice excerpts with promising but mixed results. Possible improvements are suggested.
Download Gesturally-Controlled Digital Audio Effects
This paper presents a detailed analysis of the acoustic effects of the movements of single-reed instrument performers for specific recording conditions. These effects are shown to be mostly resulting from the difference between the time of arrival of the direct sound and that of the first reflection, creating a sort of phasing or flanging effect. Contrary to the case of commercial flangers – where delay values are set by a LFO (low frequency oscillator) waveform – the amount of delay in a recording of an acoustic instrument is a function of the position of the instrument with respect to the microphone. We show that for standard recordings of a clarinet, continuous delay variations from 2 to 5 ms are possible, producing a naturally controlled effect.
Download Intermodulation Effects Analysis using Complex Bandpass Filterbanks
The objective of this paper is to show the ability of complex bandpass filterbanks to extract the intermodulation information that appears when two audio signals interact inside the same analysis band. To perform the analysis a sinusoidal model of the signals has been assumed. Three kinds of signals have been analyzed: a sum of two cosines, a sum of two linear chirps and a sum of two exponential chirps. The complex bandpass filtering of the signals is carried out using a new algorithm based on the Complex Continuous Wavelet Transform. The developed algorithm has been validated comparing the practical results with the theoretical instantaneous amplitude and instantaneous phase of the obtained model of the signals. With the appropriate width, the complex bandpass filters show the same behaviour as our perceptual ability to discriminate interacting tones when they fall inside a critical band of the human ear.
Download CMOS Implementation of an Adaptive Noise Canceller into a Subband Filter
In recent years the demand for mobile communication has increased rapidly. While in the early years of mobile phones battery life was one of the main concerns for developers speech quality is now becoming one of the most important factors in the development of the next generation of mobile phones. This paper describes the CMOS implementation of an adaptive noise canceller (ANC) into a subband filter. The ANC-Subband filter is able to reduce noise components of real speech without prior knowledge of the noise properties. It is predestined to be used in mobile devices and therefore, uses a very low clock frequency resulting in a small power consumption. This low power consumption combined with its small physical size enables the circuit also be used in hearing aids to efficiently reduce noise contained in the speech signal.
Download Using Ideas from Natural Selection to Evolve Synthesized Sounds
This paper describes a system for the automatic creation of digital synthesizer circuits that can generate sounds similar to a sampled (target) sound. The circuits will consist of very basic signal functions and generators that are arbitrarily interconnected. The system uses a “genetic algorithm” (GA) to evolve successively better circuits. First it creates populations of such synthesizers, generates the output and a fitness value of each individual circuit. The ones that are best at imitating the target sound will be kept. They are used for “breeding” to form a new generation where, hopefully, at least some individuals perform better than their parents did. The end result will be a circuit that can create a sound that resembles the target sample. Because it’s a synthesizer we can manipulate the different parameters when generating the sound. We can also get a very compact representation of the sound that can be useful when distributing music over a limited bandwidth communications channel (e.g. Internet). As we shall see, it also gives the user a very powerful tool for creating totally new sounds.
Download A Single-Azimuth Pinna-Related Transfer Function Database
Pinna-Related Transfer Functions (PRTFs) reflect the modifications undergone by an acoustic signal as it interacts with the listener’s outer ear. These can be seen as the pinna contribution to the Head-Related Transfer Function (HRTF). This paper describes a database of PRTFs collected from measurements performed at the Department of Signal Processing and Acoustics, Aalto University. Median-plane PRTFs at 61 different elevation angles from 25 subjects are included. Such data collection falls into a broader project in which evidence of the correspondence between PRTF features and anthropometry is being investigated.
Download On the Challenges of Embedded Real-Time Music Information Retrieval
Real-time applications of Music Information Retrieval (MIR) have been gaining interest as of recently. However, as deep learning becomes more and more ubiquitous for music analysis tasks, several challenges and limitations need to be overcome to deliver accurate and quick real-time MIR systems. In addition, modern embedded computers offer great potential for compact systems that use MIR algorithms, such as digital musical instruments. However, embedded computing hardware is generally resource constrained, posing additional limitations. In this paper, we identify and discuss the challenges and limitations of embedded real-time MIR. Furthermore, we discuss potential solutions to these challenges, and demonstrate their validity by presenting an embedded real-time classifier of expressive acoustic guitar techniques. The classifier achieved 99.2% accuracy in distinguishing pitched and percussive techniques and a 99.1% average accuracy in distinguishing four distinct percussive techniques with a fifth class for pitched sounds. The full classification task is a considerably more complex learning problem, with our preliminary results reaching only 56.5% accuracy. The results were produced with an average latency of 30.7 ms.
Download Vocal melody detection in the presence of pitched accompaniment using harmonic matching methods
Vocal music is characterized by a melodically salient singing voice accompanied by one or more instruments. With a pitched instrument background, multiple periodicities are simultaneously present and the task becomes one of identifying and tracking the vocal pitch based on pitch strength and smoothness constraints. Frequency domain harmonic matching methods can be applied to detect pitch via the harmonically related frequencies that fit the signal’s measured spectral peaks. The specific spectral fitness measure is expected to influence the performance of vocal pitch detection depending on the nature of the polyphonic mixture. In this work, we consider Indian classical music which provides important examples of singing voice accompanied by strongly pitched instruments. It is shown that the spectral fitness measure of the two-way mismatch method is well suited to track vocal pitch in the presence of the pitched percussion with its strong but sparse harmonic structure. The detected pitch is further used to obtain a measure of voicing that reliably discriminates vocal segments from purely instrumental regions.
Download Energy-Preserving Time-Varying Schroeder Allpass Filters
In artificial reverb algorithms, gains are commonly varied over time to break up temporal patterns, improving quality. We propose a family of novel Schroeder-style allpass filters that are energypreserving under arbitrary, continuous changes of their gains over time. All of them are canonic in delays, and some are also canonic in multiplies. This yields several structures that are novel even in the time-invariant case. Special cases for cascading and nesting these structures with a reduced number of multipliers are shown as well. The proposed structures should be useful in artificial reverb applications and other time-varying audio effects based on allpass filters, especially where allpass filters are embedded in feedback loops and stability may be an issue.
Download GABOR, multi-representation real-time analysis/synthesis
This article describes a set of modules for Max/MSP for real-time sound analysis and synthesis combining various models, representations and timing paradigms. Gabor provides a unified framework for granular synthesis, PSOLA, phase vocoder, additive synthesis and other STFT techniques. Gabor’s processing scheme allows for the treatment of atomic sound particles at arbitrary rates and instants. Gabor is based on FTM, an extension of Max/MSP, introducing complex data structures such as matrices and sequences to the Max data flow programming paradigm. Most of the signal processing operators of the Gabor modules handle vector and matrix representations closely related to SDIF sound description formats.