Download Center Signal Scaling using Signal-to-Downmix Ratios A novel method for scaling the level of the virtual center in audio signals is proposed. The input signals are processed in the time-frequency domain such that direct sound components having approximately equal energy in all channels are amplified or attenuated. The real-valued spectral weights are obtained from the ratio of the sum of the power spectral densities of all input channel signals and the power spectral density of the sum signal. Applications of the presented method are upmixing two-channel stereophonic recordings for its reproduction using surround sound set-ups, stereophonic enhancement, dialogue enhancement, and as preprocessing for semantic audio analysis.
Download Real-Time Dynamic Image-Source Implementation For Auralisation This paper describes a software package for auralisation in interactive virtual reality environments. Its purpose is to reproduce, in real time, the 3D soundfield within a virtual room where listener and sound sources can be moved freely. Output sound is presented binaurally using headphones. Auralisation is based on geometric acoustic models combined with head-related transfer functions (HRTFs): the direct sound and reflections from each source are computed dynamically by the image-source method. Directional cues are obtained by filtering these incoming sounds by the HRTFs corresponding to their propagation directions relative to the listener, computed on the basis of the information provided by a head-tracking device. Two interactive real-time applications were developed to demonstrate the operation of this software package. Both provide a visual representation of listener (position and head orientation) and sources (including image sources). One focusses on the auralisation-visualisation synchrony and the other on the dynamic calculation of reflection paths. Computational performance results of the auralisation system are presented.
Download A Preliminary Analysis of the Continuous Axis Value of the Threedimensional PAD Speech Emotional State Mode The traditional way of emotional classification involves using the two-dimensional (2D) emotional model by Thayer, which identifies emotion by arousal and valence. The 2D model is not fine enough to classify among the rich vocabularies of emotions, such as distinguish between disgusting and fear. Another problem of the traditional methods is that they don’t have a formal definition of the axis value of the emotional model. They either assign the axis value manually or rate them by listening test. We propose to use the PAD (Pleasure, Arousal, Dominance) emotional state model to describe speech emotion in a continuous 3-dimensional scale. We suggest an initial definition of the continuous axis values by observing into the pattern of Log Frequency Power Coefficients (LFPC) fluctuation. We verify the result using a database of German emotional speech. Experiments show that the classification result of a set of big-6 emotions on average is 81%.
Download Perceptual Investigation of Image Placement with Ambisonics for Non-Centred Listeners Ambisonics is a scalable spatial audio technique that attempts to present a sound scene to listeners over as large an area as possible. A localisation experiment was carried out to investigate the performance of a first and third order system at three listening positions - one in the centre and two off-centre. The test used a reverse target-pointer adjustment method to determine the error, both signed and absolute, for each combination of listening position and system. The signed error was used to indicate the direction and magnitude of the shifts in panning angle introduced for the off-centre listening positions. The absolute error was used as a measure of the performance of the listening position and systems combinations for a comparison of their overall performance. A comparison was made between the degree of image shifting between the two systems and the robustness of their off-centre performance.
Download Finite Difference Schemes on Hexagonal Grids for Thin Linear Plates with Finite Volume Boundaries The thin plate is a key structure in various musical instruments, including many percussion instruments and the soundboard of the piano, and also is the mechanism underlying electromechanical plate reverberation. As such, it is a suitable candidate for physical modelling approaches to audio effects and sound synthesis, such as finite difference methods—though great attention must be paid to the problem of numerical dispersion, in the interest of reducing perceptual artefacts. In this paper, we present two finite difference schemes on hexagonal grids for such a thin plate system. Numerical dispersion and computational costs are analysed and compared to the standard 13-point Cartesian scheme. An equivalent finite volume scheme can be related to the 13-point Cartesian scheme and a 19-point hexagonal scheme, allowing for fitted boundary conditions of the clamped type. Theoretical modes for a clamped circular plate are compared to simulations. It is shown that better agreement is obtained for the hexagonal scheme than the Cartesian scheme.
Download Prioritized Computation for Numerical Sound Propagation The finite difference time domain (FDTD) method is commonly used as a numerically accurate way of propagating sound. However, it requires extensive computation. We present a simple method for accelerating FDTD. Specifically, we modify the FDTD update loop to prioritize computation where it is needed most in order to faithfully propagate waves through the simulated space. We estimate for each potential cell update its importance to the simulation output and only update the N most important cells, where N is dependent on the time available for computation. In this paper, we explain the algorithm and discuss how it can bring enhanced accuracy and dynamism to real-time audio propagation.
Download Sinusoidal Synthesis Method using a Force-based Algorithmm In this paper we propose a synthesis method using a force-based algorithm to control frequencies of multiple sine waves. In order to implement this synthesis method, we analyze an existing sound source using a fast Fourier transform (FFT). Spectral peaks which have large magnitudes are regarded as heavy partials and assigned large attractive forces. A few hundred sine waves with stationary amplitudes are placed in a frequency space on which forces generated in the analysis phase are applied. The frequencies of the partials gravitate to the nearest peak of the reference spectrum from the source sound. As more sine waves are combined at the large peaks, the sound synthesized by the partials gradually transforms into the reference spectrum. In order to prevent the frequencies of the partials from gravitating onto localized peaks, each partial is assigned a repulsive force against all others. Through successful control of these attractive and repulsive forces, roughness and speed variation of the synthesis can be achieved. Moreover, by increasing or decreasing the number of partials according to the total amplitude of the source sound, amplitude envelope following is achieved.
Download A Method of Morphing Spectral Envelopes of the Singing Voice for Use with Backing Vocals The voice morphing process presented in this paper is based on the observation that, in many styles of music, it is often desirable for a backing vocalist to blend his or her timbre with that of the lead vocalist when the two voices are singing the same phonetic material concurrently. This paper proposes a novel application of recent morphing research for use with a source backing vocal and a target lead vocal. The function of the process is to alter the timbre of the backing vocal using spectral envelope information extracted from both vocal signals to achieve varying degrees of blending. Several original features are proposed for the unique usage context, including the use of LSFs as voice morphing parameters, and an original control algorithm that performs crossfades between synthesized and unsynthesized audio on the basis of voiced/unvoiced decisions.
Download Short-Time Time-Reversal on Audio Signals We present an analysis of short-time time-reversal on audio signals. Based on our analysis, we define parameters that can be used to control the digital effect and explain the effect each parameter has on the output. We further study the case of 50% overlap-add, then use this for a real-time implementation. Depending on the window length, the effect can modify the output sound variously, from adding overtones to adding reverse echoes. We suggest example use cases and digital effects setups for usage in sound design and recording.
Download A Statistical Approach to Automated Offline Dynamic Processing in the Audio Mastering Process Mastering audio is a complicated yet important step in music production. It is used for many purposes, an important one is to ensure a typical loudness for a piece of music within its genre. In order to automate this step we use a statistical model of the dynamic section. To allow a statistical approach we need to introduce some modifications to the compressor’s side-chain or more precisely to its ballistics. We then develop an offline framework to determine compressor parameters for the music at hand such that the signal’s statistic properties meet certain target properties, namely statistical central moments, which for example can be chosen genre specific. Finally the overall system is tested with songs which are available to us as unmastered, professionally mastered, and only compressed versions.