Download Implementing Digital Audio Effects Using a Hardware/Software Co-Design Approach Digital realtime audio effects as of today are realized in software in almost all cases. The hardware platforms used for this purpose reach from multi purpose processors like the Intel Pentium class over embedded processors (e.g. the ARM family) to specialized DSP. The upcoming technology of complete systems on a single programmable chip contrasts such a software centric solution, because it combines software and hardware via some co-design methodology and makes for a promising alternative for the future of realtime audio. Such systems are able to combine the vast amount of computing power provided by dedicated hardware with the flexibility offered by software in a way the designer is free to influence. While the main realization vehicles for these systems – FPGAs – were already promising but unfortunately offered limited possibilities a decade ago [1] they have made rapid progress over the years being one of the product classes that drive the silicon technology of tomorrow. We describe an example for such a realtime digital effects system which was developed using a hardware/software co-design method. While digital realtime audio processing takes place in low latency dedicated hardware units the control and routing of audio streams is done by software running on a 32 bit NIOS II softcore processor. Implementation of the hardware units is done using a DSP centric methodology for raising the abstraction level of VHDL descriptions while still making use of standard of the shelf FPGA synthesis tools. The physical implementation of the complete system uses a rapid prototyping board tailored for communications and audio applications based on an Altera Cyclone II FPGA.
Download Binaural Source Separation in Non-Ideal Reverberant Environments This paper proposes a framework for separating several speech sources in non-ideal, reverberant environments. A movable human dummy head residing in a normal office room is used to model the conditions humans experience when listening to complex auditory scenes. Before the source separation takes place the human dummy head explores the auditory scene and extracts characteristics the same way as humans would do, when entering a new auditory scene. These extracted features are used to support several source separation algorithms that are carried out in parallel. Each of these algorithms estimates a binary time-frequency mask to separate the sources. A combination stage infers a final estimate of the binary mask to demix the source of interest. The presented results show good separation capabilities in auditory scenes consisting of several speech sources.
Download Spatial Track Transition Effects for Headphone Listening In this paper we study the use of different spatial processing techniques to create audio effects for forced transitions between music tracks in headphone listening. The audio effect encompasses a movement of the initially playing track to the side of the listener while the next track to be played moves into a central position simultaneously. We compare seven different methods for creating this effect in a listening test where the task of the user is to characterize the span of the spatial movement of audio play list items around the listener’s head. The methods used range from amplitude panning up to full Head Related Transfer Function (HRTF) rendering. It is found that a computationally efficient method using time-varying interaural time differences is equally effective in creating a large spatial span as the full HRTF rendering method.
Download Monophonic Source Localization for a Distributed Audience in a Small Concert Hall The transfer of multichannel spatialization schemes from the studio to the concert hall presents numerous challenges to the contemporary spatial music composer or engineer. The presence of a reverberant listening environment coupled with a distributed audience are significant factors in the presentation of multichannel spatial music. This paper presents a review of the existing research on the localization performance of various spatialization techniques and their ability to cater for a distributed audience. As the firststep in a major comparative study of such techniques, the results of listening tests for monophonic source localization for a distributed audience in a reverberant space are presented. These results provide a measure of the best possible performance that can be expected from any spatialization technique under similar conditions. Keywords: Sound localization, distributed audience, spatial music.
Download Synthesis of a Macro Sound Structure within a Self Organizing System This paper is focused on synthesizing macro-sound structures with certain ecological attributes to obtain perceptually interesting and compositionally useful results. The system, which delivers the sonic result is designed as a self organizing system. Certain principles of cybernetics are critically assessed in the paper in terms of interdependencies among system components, system dynamics and the system/environment coupling. It is aiming towards a self evolution of an ecological kind, applying an interactive exchange with its external conditions. The macro-organization of the sonic material is a result of interactions of events at a meso and micro level but also this exchange with its environment. The goal is to formulate some new principles and present its sketches here by arriving to a network of concepts suggesting new ideas in sound synthesis.
Download Characteristics of Broken-Line Approximation and Its Use in Distortion Audio Effects This paper deals with an analytic solution of spectrum changes in scalar non-linear discrete systems without memory, whose transfer characteristics can be approximated via broken-line function. The paper also deals with relations between the harmonics ratio and the approximation parameters. Furthermore, the dependence of the harmonics ratio on the amplitude of a harmonic input signal is presented for the most common characteristics that are approximated via broken-line function. These characteristics are judged from the dissonance point of view.
Download Effective Singing Voice Detection in Popular Music Using ARMA Filtering Locating singing voice segments is essential for convenient indexing, browsing and retrieval large music archives and catalogues. Furthermore, it is beneficial for automatic music transcription and annotations. The approach described in this paper uses Mel-Frequency Cepstral Coefficients in conjunction with Gaussian Mixture Models for discriminating two classes of data (instrumental music and singing voice with music background). Due to imperfect classification behavior, the categorization without additional post-processing tends to alternate within a very short time span, whereas singing voice tends to be continuous for several frames. Thus, various tests have been performed to identify a suitable decision function and corresponding smoothing methods. Results are reported by comparing the performance of straightforward likelihood based classifications vs. postprocessing with an autoregressive moving average filtering method.
Download Non-Linear Digital Implementation of a Parametric Analog Tube Ground Cathode Amplifier In this paper we propose a digital simulation of an analog amplifier circuit based on a grounded-cathode amplifier with parametric tube model. The time-domain solution enables the online valve model substitution and zero-latency changes in polarization parameters. The implementation also allows the user to match various types of tube processing features.
Download A Similarity Measure for Audio Query by Example Based on Perceptual Coding and Compression Query by example for multimedia signals aims at automatic retrieval of samples from the media database similar to a userprovided example. This paper proposes a similarity measure for query by example of audio signals. The method first represents audio signals using perceptual audio coding and second estimates the similarity of two signals from the advantage gained by compressing the files together in comparison to compressing them individually. Signals which benefit most from compressing together are considered similar. The low bit rate perceptual audio coding preprocessing effectively retains perceptually important features while quantizing the signals so that identical codewords appear, allowing further inter-signal compression. The advantage of the proposed similarity measure is that it is parameter-free, thus it is easy to apply in wide range of tasks. Furthermore, users’ expectations do not affect the results like they do in parameter-laden algorithms. A comparison was made against the other query by example methods and simulation results reveal that the proposed method gives competitive results against the other methods.
Download The REACTION System: Automatic Sound Segmentation and Word Spotting for Verbal Reaction Time Tests Reaction tests are typical tests from the field of psychological research and communication science in which a test person is presented some stimulus like a photo, a sound, or written words. The individual has to evaluate the stimulus as fast as possible in a predefined manner and has to react by presenting the result of the evaluation. This could be by pushing a button in simple reaction tests or by saying an answer in verbal reaction tests. The reaction time between the onset of the stimulus and the onset of the response can be used as a degree of difficulty for performing the given evaluation. Compared to simple reaction tests verbal reaction tests are very powerful since the individual can simply say the answer which is the most natural way of answering. The drawback for verbal reaction tests is that today the reaction times still have to be determined manually. This means that a person has to listen through all audio recordings taken during test sessions and mark stimuli times and word beginnings one by one which is very time consuming and people-intensive. To replace the manual evaluation of reaction tests this article presents the REACTION (Reaction Time Determination) system which can automatically determine the reaction times of a test session by analyzing the audio recording of the session. The system automatically detects the onsets of stimuli as well as the onsets of answers. The recording is furthermore segmented into parts each containing one stimulus and the following reaction which further facilitates the transcription of the spoken words for a semantic evaluation.