Download Audio Time-Scaling for Slow Motion Sports Videos
Slow motion videos are frequently featured during broadcast of sports events. However, these videos do not feature any audio channel, apart from the live ambiance and comments from sports presenters. Standard audio time-scaling methods were not developed with such noisy signal in mind and they do not always permit to obtain an acceptable acoustic quality. In this work, we present a new approach that creates high-quality time-stretched version of sport audio recordings while preserving all their transient events.
Download Error Robust Delay-Free Lossy Audio Coding Based on ADPCM
We consider the problem of transmission errors in the well known adaptive differential pulse code modulation (ADPCM) system. A single transmission error destabilizes the reconstruction process at the decoder side in the ADPCM coding scheme if a non-leaky algorithm is used. We propose a delay-free and fixed rate of 3 bit/sample audio source coding scheme based on a robust prediction. The prediction of the backward ADPCM coding scheme is attained as a FIR filter in lattice structure. The prediction filter is derived as a reconstructed-signal-driven (RSD) or a predictionerror-driven (PED) lattice filter. A technique for an error robust RSD prediction is presented. This technique is employed in a robust audio coding scheme without use of any additional overhead. The proposed modified RSD-ADPCM is compared to the PED-ADPCM coding scheme by means of the objective audio quality. The proposed system yields good objective audio quality in the noise-free channels and provides robustness in the presence of transmission errors.
Download Selection And Interpolation of Head-Related Transfer Functions for Rendering Moving Virtual Sound Sources
A variety of approaches have been proposed previously to interpolate head-related transfer functions (HRTFs). However, relatively little attention has been given to the way a suitable set of HRTFs is chosen for interpolation and to the calculation of the interpolation weights. This paper presents an efficient and robust way to select a minimal set of HRTFs and to calculate appropriate weights for interpolation. The proposed method is based on grouping HRTF measurement points into non-overlapping triangles on the surface of a sphere by calculating the convex hull. The resulting Delaunay triangulation maximises minimum angles. For interpolation, the HRTF triangle that is intersected by the desired sound source vector is selected. The selection is based on a point-in-triangle test than can be performed using just 9 multiplications and 6 additions per triangle. A further improvement of the selection process is achieved by sorting the HRTF triangles according to their distance from the sound source vector prior to performing the pointin-triangle tests. The HRTFs of the selected triangle are interpolated using weights derived from vector-base amplitude panning, with appropriate normalisation. The proposed method is compared to state-of-the-art methods. It is shown to be robust with respect to irregularities in the HRTF measurement grid and to be well-suited for rendering moving virtual sources.
Download Room Acoustics Modelling using Gpu-Accelerated Finite Difference and Finite Volume Methods On a Face-Centered Cubic Grid
In this paper, a room acoustics simulation using a finite difference approximation on a face-centered cubic (FCC) grid with finite volume impedance boundary conditions is presented. The finite difference scheme is accelerated on an Nvidia Tesla K20 graphics processing unit (GPU) using the CUDA programming language. A performance comparison is made between 27-point finite difference schemes on a cubic grid and the 13-point scheme on the FCC grid. It is shown that the FCC scheme runs faster on the Tesla K20 GPU and has less numerical dispersion than best 27-point schemes on the cubic grid. Implementation details are discussed.
Download A New Reverberator based on Variable Sparsity Convolution
An efficient algorithm approximating the late part of room reverberation is proposed. The algorithm partitions the impulse response tail into variable-length segments and replaces them with a set of sparse FIR filters and lowpass filters, cascaded with several Schroeder allpass filters. The sparse FIR filter coefficients are selected from a velvet noise sequence, which consists of ones, minus ones, and zeros only. In this application, it is sufficient perceptually to use very sparse velvet noise sequences having only about 0.1 to 0.2% non-zero elements, with increasing sparsity along the impulse response. The algorithm yields a parametric approximation of the late part of the impulse response, which is more than 100 times more efficient computationally than the direct convolution. The computational load of the proposed algorithm is comparable to that of FFT-based partitioned convolution techniques, but with nearly half the memory usage. The main advantage of the new reverberator is the flexible parameterization.
Download B-Format Acoustic Impulse Response Measurement and Analysis In the Forest at Koli National Park, Finland
Acoustic impulse responses are used for convolution based auralisation and reverberation techniques for a range of applications, such as music production, sound design and virtual reality systems. These impulse responses can be measured in real world environments to provide realistic and natural sounding reverberation effects. Analysis of this data can also provide useful information about the acoustic characteristics of a particular space. Currently, impulse responses recorded in outdoor conditions are not widely available for surround sound auralisation and research purposes. This work presents results from a recent acoustic survey of measurements at three locations in the snow covered forest of Koli National Park in Finland during early spring. Acoustic impulse responses were measured using a B-format Soundfield microphone and a single loudspeaker. The results are analysed in terms of reverberation and spatial characteristics. The work is part of a larger study to collect and investigate acoustic impulse responses from a variety of outdoor locations under different climatic conditions.
Download A Scalable Architecture for General Real-Time Array-Based DSP on FPGAs with Application to the Wave Equation
This paper describes a scheme for parallel execution on FPGAs of DSP tasks which rely heavily on MAC operations. Multiple operations are assigned to a single ‘processing node’ such that each node can operate just in real-time. Where the number of MACs required exceeds the capability of a single processing node additional nodes are added until the capacity of the FPGA is exhausted. Additional requirements beyond the capability of a single FPGA are accommodated by extension across multiple devices, offering significant scalability. Resource usage, performance results for an example acoustic modelling application on a modest single FPGA and development system are presented.
Download Center Signal Scaling using Signal-to-Downmix Ratios
A novel method for scaling the level of the virtual center in audio signals is proposed. The input signals are processed in the time-frequency domain such that direct sound components having approximately equal energy in all channels are amplified or attenuated. The real-valued spectral weights are obtained from the ratio of the sum of the power spectral densities of all input channel signals and the power spectral density of the sum signal. Applications of the presented method are upmixing two-channel stereophonic recordings for its reproduction using surround sound set-ups, stereophonic enhancement, dialogue enhancement, and as preprocessing for semantic audio analysis.
Download Real-Time Dynamic Image-Source Implementation For Auralisation
This paper describes a software package for auralisation in interactive virtual reality environments. Its purpose is to reproduce, in real time, the 3D soundfield within a virtual room where listener and sound sources can be moved freely. Output sound is presented binaurally using headphones. Auralisation is based on geometric acoustic models combined with head-related transfer functions (HRTFs): the direct sound and reflections from each source are computed dynamically by the image-source method. Directional cues are obtained by filtering these incoming sounds by the HRTFs corresponding to their propagation directions relative to the listener, computed on the basis of the information provided by a head-tracking device. Two interactive real-time applications were developed to demonstrate the operation of this software package. Both provide a visual representation of listener (position and head orientation) and sources (including image sources). One focusses on the auralisation-visualisation synchrony and the other on the dynamic calculation of reflection paths. Computational performance results of the auralisation system are presented.
Download A Preliminary Analysis of the Continuous Axis Value of the Threedimensional PAD Speech Emotional State Mode
The traditional way of emotional classification involves using the two-dimensional (2D) emotional model by Thayer, which identifies emotion by arousal and valence. The 2D model is not fine enough to classify among the rich vocabularies of emotions, such as distinguish between disgusting and fear. Another problem of the traditional methods is that they don’t have a formal definition of the axis value of the emotional model. They either assign the axis value manually or rate them by listening test. We propose to use the PAD (Pleasure, Arousal, Dominance) emotional state model to describe speech emotion in a continuous 3-dimensional scale. We suggest an initial definition of the continuous axis values by observing into the pattern of Log Frequency Power Coefficients (LFPC) fluctuation. We verify the result using a database of German emotional speech. Experiments show that the classification result of a set of big-6 emotions on average is 81%.