Download Low-delay vector-quantized subband ADPCM coding
Several modern applications require audio encoders featuring low data rate and lowest delays. In terms of delay, Adaptive Differential Pulse Code Modulation (ADPCM) encoders are advantageous compared to block-based codecs due to their instantaneous output and therefore preferred in time-critical applications. If the the audio signal transport is done block-wise anyways, as in Audio over IP (AoIP) scenarios, additional advantages can be expected from block-wise coding. In this study, a generalized subband ADPCM concept using vector quantization with multiple realizations and configurations is shown. Additionally, a way of optimizing the codec parameters is derived. The results show that for the cost of small algorithmic delays the data rate of ADPCM can be significantly reduced while obtaining a similar or slightly increased perceptual quality. The largest algorithmic delay of about 1 ms at 44.1 kHz is still smaller than the ones of well-known low-delay codecs.
Download GstPEAQ – an Open Source Implementation of the PEAQ Algorithm
In 1998, the ITU published a recommendation for an algorithm for objective measurement of audio quality, aiming to predict the outcome of listening tests. Despite the age, today only one implementation of that algorithm meeting the conformance requirements exists. Additionally, two open source implementations of the basic version of the algorithm are available which, however, do not meet the conformance requirements. In this paper, yet another non-conforming open source implementation, GstPEAQ, is presented. However, it improves upon the previous ones by coming closer to conformance and being computationally more efficient. Furthermore, it implements not only the basic, but also the advanced version of the algorithm. As is also shown, despite the nonconformance, the results obtained computationally still closely resemble those of listening tests.
Download Cascaded prediction in ADPCM codec structures
The aim of this study is to demonstrate how ADPCM-based codec structures can be improved using cascaded prediction. The advantage of predictor cascades is to allow the adaption to several signal conditions, as it is done in block-based perceptual codecs like MP3, AAC, etc. In other words, additional predictors with a small order are supposed to enhance the prediction of non-stationary signals. The predictor cascade is complemented with a simple adaptive quantizer to yield a simple exemplary codec which is used to demonstrate the influence of the predictor cascade. Several cascade configurations are considered and optimized using a genetic algorithm. A measurement of the prediction gain and the ODG score utilizing the PEAQ algorithm applied to the SQAM dataset shall reveal the potential improvements.
Download Vowel Conversion by Phonetic Segmentation
In this paper a system for vowel conversion between different speakers using short-time speech segments is presented. The input speech signal is segmented into period-length speech segments whose fundamental frequency and first two formants are used to find the perceivable vowel-quality. These segments are used to represent a voiced phoneme, i.e. a vowel. The approach relies on pitchsynchronous analysis and uses a modified PSOLA technique for concatenation of the vowel segments. Vowel conversion between speakers is achieved by exchanging the phonetic constituents of a source speaker’s speech waveform in voiced regions of speech whilst preserving prosodic features of the source speaker, thus introducing a method for phonetic segmentation, mapping, and reconstruction of vowels.
Download Downmix compatible conversion from mono to stereo in time- and frequency-domain
Even in a time of surround and 3D sound, many tracks and recordings are still only available in mono or it is not feasible to record a source with multiple microphones for several reasons. In these cases, a pseudo stereo conversion of mono signals can be a useful preprocessing step and/or an enhancing audio effect. The conversion proposed in this paper is designed to deliver a neutral sounding stereo image by avoiding timbral coloration or reverberation. Additionally, the resulting stereo signal is downmix-compatible and allows to revert to the original mono signal by a simple summation of the left and right channels. Several configuration parameters are shown to control the stereo panorama. The algorithm can be implemented in time-domain or also in the frequency-domain with additional features, like center focusing.