Download Low-Delay Error Concealment with Low Computational Overhead for Audio over IP Applications
A major problem in low-latency Audio over IP transmission is the unpredictable impact of the underlying network, leading to jitter and packet loss. Typically, error concealment strategies are employed at the receiver to counteract audible artifacts produced by missing audio data resulting from the mentioned network characteristics. Known concealment methods tend to achieve only unsatisfactory audio quality or cause high computational costs. Hence, this study aims at finding a new low-cost concealment strategy using simplest algorithms. The proposed system basically consists of an period extraction and alignment module to synthesize concealment signals from previous data. The audio quality is evaluated in form of automated measurements using PEAQ. Furthermore, the system’s complexity is analyzed by drawing the computational costs of all required modules in all operating modes and comparing its computational load versus another concealment method based on auto-regressive modeling.
Download Low-delay vector-quantized subband ADPCM coding
Several modern applications require audio encoders featuring low data rate and lowest delays. In terms of delay, Adaptive Differential Pulse Code Modulation (ADPCM) encoders are advantageous compared to block-based codecs due to their instantaneous output and therefore preferred in time-critical applications. If the the audio signal transport is done block-wise anyways, as in Audio over IP (AoIP) scenarios, additional advantages can be expected from block-wise coding. In this study, a generalized subband ADPCM concept using vector quantization with multiple realizations and configurations is shown. Additionally, a way of optimizing the codec parameters is derived. The results show that for the cost of small algorithmic delays the data rate of ADPCM can be significantly reduced while obtaining a similar or slightly increased perceptual quality. The largest algorithmic delay of about 1 ms at 44.1 kHz is still smaller than the ones of well-known low-delay codecs.
Download GstPEAQ – an Open Source Implementation of the PEAQ Algorithm
In 1998, the ITU published a recommendation for an algorithm for objective measurement of audio quality, aiming to predict the outcome of listening tests. Despite the age, today only one implementation of that algorithm meeting the conformance requirements exists. Additionally, two open source implementations of the basic version of the algorithm are available which, however, do not meet the conformance requirements. In this paper, yet another non-conforming open source implementation, GstPEAQ, is presented. However, it improves upon the previous ones by coming closer to conformance and being computationally more efficient. Furthermore, it implements not only the basic, but also the advanced version of the algorithm. As is also shown, despite the nonconformance, the results obtained computationally still closely resemble those of listening tests.
Download Cascaded prediction in ADPCM codec structures
The aim of this study is to demonstrate how ADPCM-based codec structures can be improved using cascaded prediction. The advantage of predictor cascades is to allow the adaption to several signal conditions, as it is done in block-based perceptual codecs like MP3, AAC, etc. In other words, additional predictors with a small order are supposed to enhance the prediction of non-stationary signals. The predictor cascade is complemented with a simple adaptive quantizer to yield a simple exemplary codec which is used to demonstrate the influence of the predictor cascade. Several cascade configurations are considered and optimized using a genetic algorithm. A measurement of the prediction gain and the ODG score utilizing the PEAQ algorithm applied to the SQAM dataset shall reveal the potential improvements.
Download Vowel Conversion by Phonetic Segmentation
In this paper a system for vowel conversion between different speakers using short-time speech segments is presented. The input speech signal is segmented into period-length speech segments whose fundamental frequency and first two formants are used to find the perceivable vowel-quality. These segments are used to represent a voiced phoneme, i.e. a vowel. The approach relies on pitchsynchronous analysis and uses a modified PSOLA technique for concatenation of the vowel segments. Vowel conversion between speakers is achieved by exchanging the phonetic constituents of a source speaker’s speech waveform in voiced regions of speech whilst preserving prosodic features of the source speaker, thus introducing a method for phonetic segmentation, mapping, and reconstruction of vowels.
Download Downmix compatible conversion from mono to stereo in time- and frequency-domain
Even in a time of surround and 3D sound, many tracks and recordings are still only available in mono or it is not feasible to record a source with multiple microphones for several reasons. In these cases, a pseudo stereo conversion of mono signals can be a useful preprocessing step and/or an enhancing audio effect. The conversion proposed in this paper is designed to deliver a neutral sounding stereo image by avoiding timbral coloration or reverberation. Additionally, the resulting stereo signal is downmix-compatible and allows to revert to the original mono signal by a simple summation of the left and right channels. Several configuration parameters are shown to control the stereo panorama. The algorithm can be implemented in time-domain or also in the frequency-domain with additional features, like center focusing.
Download Black-box Modeling of Distortion Circuits with Block-Oriented Models
This paper describes black-box modeling of distortion circuits. The analyzed distortion circuits all originate from guitar effect pedals, which are widely used to enrich the sound of an electric guitar with harmonics. The proposed method employs a blockoriented model which consists of a linear block (filter) and a nonlinear block. In this study the nonlinear block is represented by an extended parametric input/output mapping function. Three distortion circuits with different nonlinear elements are analyzed and modeled. The linear and nonlinear parts of the circuit are analyzed and modeled separately. The Levenberg–Marquardt algorithm is used for iterative optimization of the nonlinear parts of the circuits. Some circuits could not be modeled with high accuracy, but the proposed model has shown to be a versatile and flexible tool when modeling distortion circuits.
Download Circuit Simulation with Inductors and Transformers Based on the Jiles-Atherton Model of Magnetization
The sound of a vacuum tube guitar amplifier may be significantly influenced by the non-linear behavior of its output transformer, which therefore should also be considered in digital simulations. In this work, we develop a model for inductors and transformers with the magnetization following the model of Jiles and Atherton. For this purpose, the original magnetization model is rewritten to a differential equation with respect to time which can then easily be integrated into a previously developed circuit simulation framework. The model thus derived is then exercised in the simulation of three simple circuits where it shows the expected behavior.
Download Signal-Matched Power-Complementary Cross-Fading and Dry-Wet Mixing
The blending of audio signals, called cross-fading, is a very common task in audio signal processing. Therefore, digital audio workstations offer several fading curves to select from. The choice of the fading curve typically depends on the signal characteristics and is supposed to result in a mixed signal featuring power and loudness close to the input signals. This work derives a correlationbased design of the fading curves to achieve exact power consistency to avoid audible fluctuations of the signal’s loudness. This principle is extended to the problem of mixing original signals with effect-processed signals using the dry-wet balance. Weighting coefficients for dry and wet signals are derived which realize the desired dry-wet balance but maintain the signal power.
Download Time-Domain Implementation of a Stereo to Surround Sound Upmix Algorithm
This paper describes a time-domain algorithm to upmix stereo recordings for an enhanced playback on a surround sound loudspeaker setup. It is mainly the simplified version of a previously published frequency-domain algorithm where the standard shorttime Fourier transform is now replaced by an IIR filter bank. The design of complementary filter blocks and their arrangement in a tree structure to form a filter bank are derived. The arithmetic complexity of the filter bank itself and of the complete upmix algorithm is analysed and compared to the frequency-domain approach. The time-domain upmix is less flexible in its configuration but achieves an audio quality comparable to the frequency-domain implementation at a fraction of its computational cost.