Download Visualaudio-Design – Towards a Graphical Sounddesign
VisualAudio-Design (VAD) is a spectral-node based approach to visually design audio collages and sounds. The spectrogram as a visualization of the frequency-domain can be intuitively manipulated with tools known from image processing. Thereby, a more comprehensible sound design is described to address common abstract interfaces for DSP algorithms that still use direct value inputs, sliders, or knobs. In addition to interaction in the timedomain of audio and conventional analysis and restoration tasks, there are many new possibilities for spectral manipulation of audio material. Here, affine transformations and two-dimensional convolution filters are proposed.
Download An Open Source Stereo Widening Plugin
Stereo widening algorithms aim to extend the stereo image width and thereby, increase the perceived spaciousness of a mix. Here, we present the design and implementation of a stereo widening plugin that is computationally efficient. First, a stereo signal is decorrelated by convolving with a velvet noise sequence, or alternately, by passing through a cascade of allpass filters with randomised phase. Both the original and decorrelated signals are passed through perfect reconstruction filterbanks to get a set of lowpassed and highpassed signals. Then, the original and decorrelated filtered signals are combined through a mixer and summed to produce the final stereo output. Two separate parameters control the perceived width of the lower frequencies and higher frequencies respectively. A transient detection block prevents the smearing of percussive signals caused by the decorrelation filters. The stereo widener has been released as an open-source plugin.
Download Breaking the Bounds: Introducing Informed Spectral Analysis
Sound applications based on sinusoidal modeling highly depend on the efficiency and the precision of the estimators of its analysis stage. In a previous work, theoretical bounds for the best achievable precision were shown and these bounds are reached by efficient estimators like the reassignment or the derivative methods. We show that it is possible to break these theoretical bounds with just a few additional bits of information of the original content, introducing the concept of “informed analysis”. This paper shows that existing estimators combined with some additional information can reach any expected level of precision, even in very low signal-to-noise ratio conditions, thus enabling high-quality sound effects, without the typical but unwanted musical noise.
Download Modeling the Frequency-Dependent Sound Energy Decay of Acoustic Environments with Differentiable Feedback Delay Networks
Differentiable machine learning techniques have recently proved effective for finding the parameters of Feedback Delay Networks (FDNs) so that their output matches desired perceptual qualities of target room impulse responses. However, we show that existing methods tend to fail at modeling the frequency-dependent behavior of sound energy decay that characterizes real-world environments unless properly trained. In this paper, we introduce a novel perceptual loss function based on the mel-scale energy decay relief, which generalizes the well-known time-domain energy decay curve to multiple frequency bands. We also augment the prototype FDN by incorporating differentiable wideband attenuation and output filters, and train them via backpropagation along with the other model parameters. The proposed approach improves upon existing strategies for designing and training differentiable FDNs, making it more suitable for audio processing applications where realistic and controllable artificial reverberation is desirable, such as gaming, music production, and virtual reality.
Download Polyphonic Pitch Detection by Iterative Analysis of the Autocorrelation Function
In this paper, a polyphonic pitch detection approach is presented, which is based on the iterative analysis of the autocorrelation function. The idea of a two-channel front-end with periodicity estimation by using the autocorrelation is inspired by an algorithm from Tolonen and Karjalainen. However, the analysis of the periodicity in the summary autocorrelation function is enhanced with a more advanced iterative peak picking and pruning procedure. The proposed algorithm is compared to other systems in an evaluation with common data sets and yields good results in the range of state of the art systems.
Download Comparison of Various Predictors for Audio Extrapolation
In this study, receiver-based audio error concealment in the context of low-latency Audio over IP transmission is analyzed. Therefore, the well-known technique of audio extrapolation is investigated concerning its usability in real-time scenarios, its applied prediction techniques and various transmission parameters. A large-scale automated evaluation with PEAQ and a MUSHRA listening test reveal the performance of the various extrapolation setups. The results show the suitability of extrapolation to perform audio error concealment in real-time and the qualitative superiority of block based methods over sample based methods.
Download Extraction of Metrical Structure from Music Recordings
Rhythm is a fundamental aspect of music and metrical structure is an important rhythm-related element. Several mid-level features encoding metrical structure information have been proposed in the literature, although the explicit extraction of this information is rarely considered. In this paper, we present a method to extract the full metrical structure from music recordings without the need for any prior knowledge. The algorithm is evaluated against expert annotations of metrical structure for the GTZAN dataset, each track being annotated multiple times. Inter-annotator agreement and the resulting upper bound on algorithm performance are evaluated. The proposed system reaches 93% of this upper limit and largely outperforms the baseline method.
Download A Complex Wavelet Based Fundamental Frequency Estimator in Single-Channel Polyphonic Signals
In this work, a new estimator of the fundamental frequencies (F0 ) present in a polyphonic single-channel signal is developed. The signal is modeled in terms of a set of discrete partials obtained by the Complex Continuous Wavelet Transform (CCWT). The fundamental frequency estimation is based on the energy distribution of the detected partials of the input signal followed by an spectral smoothness technique. The proposed algorithm is designed to work with suppressed fundamentals, inharmonic partials and harmonic related sounds. The detailed technique has been tested over a set of input signals including polyphony 2 to 6, with high precision results that show the strength of the algorithm. The obtained results are very promising in order to include the developed algorithm as the basis of Blind Sound Source Separation or automatic score transcription techniques.
Download Monophonic Pitch Detection by Evaluation of Individually Parameterized Phase Locked Loops
This paper describes a new efficient and sample based monophonic pitch tracking approach using multiple phase locked loops (PLLs). Hereby, distinct subband signals traverse pairs of individually parameterized PLLs. Based on the relation of the instantaneous pitch sample of respective PLLs to one another, relevant features per pitch candidate are derived. These features are combined into pitch candidate scores. Pitch candidates which exhibit the maximum score per sampling instance and exceed a voicing threshold, contribute to the overall pitch track. Evaluations with up to date datasets show that the tracking performance, compared to implementations which use only one PLL has significantly improved and nearly approaches the scores of a state of the art monophonic pitch tracker.
Download Streaming Spectral Processing with Consumer-Level Graphics Processing Units
This paper describes the implementation of a streaming spectral processing system for realtime audio in a consumer-level onboard GPU (Graphics Processing Unit) attached to an off-the-shelf laptop computer. It explores the implementation of four processes: standard phase vocoder analysis and synthesis, additive synthesis and the sliding phase vocoder. These were developed under the CUDA development environment as plugins for the Csound 6 audio programming language. Following a detailed exposition of the GPU code, results of performance tests are discussed for each algorithm. They demonstrate that such a system is capable of realtime audio, even under the restrictions imposed by a limited GPU capability.