Download System analysis and performance tuning for broadcast audio fingerprinting An audio fingerprint is a content-based compact signature that summarizes an audio recording. Audio Fingerprinting technologies have recently attracted attention since they allow the monitoring of audio independently of its format and without the need of meta-data or watermark embedding. These technologies need to face channel robustness as well as system accuracy and scalability to succeed on real audio broadcasting environments. This paper presents a complete audio fingerprinting system for audio broadcasting monitoring that satisfies the above system requirements. The system performance is enhanced with four proposals that required detailed analysis of the system blocks as well as extense system tuning experiments.
Download Modulation And Delay Line Based Digital Audio Effects In the field of musicians and recording engineers audio effects are mainly described and indicated by their acoustical effect. Audio effects can also be categorized from a technical point of view. The main criterion is found to be the type of modulation technique used to achieve the effect. After a short introduction to the different modulation types, three more sophisticated audio effect applications are presented, namely single sideband domain vibrato (mechanical vibrato bar simulation), a rotary speaker simulation, and an enhanced pitch transposing scheme.
Download A Real-Time DSP-based Reverberation System with Computer This paper describes a highly versatile, low-cost reverberation system comprising two main elements: a computer for building and editing the desired reverberation effect impulse response, and a commercial DSP-based board, to run the algorithm in real-time, allowing the evaluation of the results. The main parameters of the reverberation algorithm can be modified by means of a dedicated graphic interface in the host computer.
Download Room Acoustics Modelling using Gpu-Accelerated Finite Difference and Finite Volume Methods On a Face-Centered Cubic Grid In this paper, a room acoustics simulation using a finite difference approximation on a face-centered cubic (FCC) grid with finite volume impedance boundary conditions is presented. The finite difference scheme is accelerated on an Nvidia Tesla K20 graphics processing unit (GPU) using the CUDA programming language. A performance comparison is made between 27-point finite difference schemes on a cubic grid and the 13-point scheme on the FCC grid. It is shown that the FCC scheme runs faster on the Tesla K20 GPU and has less numerical dispersion than best 27-point schemes on the cubic grid. Implementation details are discussed.
Download Audio analysis in PWGLSynth In this paper, we present an incremental improvement of a known fundamental frequency estimation algorithm for monophonic signals. This is viewed as a case study of using our signal graph based synthesis language, PWGLSynth, for audio analysis. The roles of audio and control signals are discussed in both analysis and synthesis contexts. The suitability of the PWGLSynth system for this field of applications is examined and some problems and future work is identified.
Download Adaptive Pitch-Shifting With Applications to Intonation Adjustment in a Cappella Recordings A central challenge for a cappella singers is to adjust their intonation and to stay in tune relative to their fellow singers. During
editing of a cappella recordings, one may want to adjust local intonation of individual singers or account for global intonation drifts
over time. This requires applying a time-varying pitch-shift to the
audio recording, which we refer to as adaptive pitch-shifting. In
this context, existing (semi-)automatic approaches are either laborintensive or face technical and musical limitations. In this work,
we present automatic methods and tools for adaptive pitch-shifting
with applications to intonation adjustment in a cappella recordings. To this end, we show how to incorporate time-varying information into existing pitch-shifting algorithms that are based on
resampling and time-scale modification (TSM). Furthermore, we
release an open-source Python toolbox, which includes a variety
of TSM algorithms and an implementation of our method. Finally,
we show the potential of our tools by two case studies on global
and local intonation adjustment in a cappella recordings using a
publicly available multitrack dataset of amateur choral singing.
Download Analysis and Trans-synthesis of Acoustic Bowed-String Instrument Recordings: a Case Study using Bach Cello Suites In this paper, analysis and trans-synthesis of acoustic bowed string instrument recordings with new non-negative matrix factorization (NMF) procedure are presented. This work shows that it may require more than one template to represent a note according to time-varying behavior of timbre, especially played by bowed string instruments. The proposed method improves original NMF without the knowledge of tone models and the number of required templates in advance. Resultant NMF information is then converted into the synthesis parameters of the sinusoidal synthesis. Bach cello suites recorded by Fournier and Starker are used in the experiments. Analysis and trans-synthesis examples of the recordings are also provided. Index Terms—trans-synthesis, non-negative matrix factorization, bowed string instrument
Download Differentiable Time–frequency Scattering on GPU Joint time–frequency scattering (JTFS) is a convolutional operator in the time–frequency domain which extracts spectrotemporal modulations at various rates and scales. It offers an idealized model of spectrotemporal receptive fields (STRF) in the primary auditory cortex, and thus may serve as a biological plausible surrogate for human perceptual judgments at the scale of isolated audio events. Yet, prior implementations of JTFS and STRF have remained outside of the standard toolkit of perceptual similarity measures and evaluation methods for audio generation. We trace this issue down to three limitations: differentiability, speed, and flexibility. In this paper, we present an implementation of time–frequency scattering in Python. Unlike prior implementations, ours accommodates NumPy, PyTorch, and TensorFlow as backends and is thus portable on both CPU and GPU. We demonstrate the usefulness of JTFS via three applications: unsupervised manifold learning of spectrotemporal modulations, supervised classification of musical instruments, and texture resynthesis of bioacoustic sounds.
Download Hyper Recurrent Neural Network: Condition Mechanisms for Black-Box Audio Effect Modeling Recurrent neural networks (RNNs) have demonstrated impressive results for virtual analog modeling of audio effects. These networks process time-domain audio signals using a series of matrix multiplication and nonlinear activation functions to emulate the behavior of the target device accurately. To additionally model the effect of the knobs for an RNN-based model, existing approaches integrate control parameters by concatenating them channel-wisely with some intermediate representation of the input signal. While this method is parameter-efficient, there is room to further improve the quality of generated audio because the concatenation-based conditioning method has limited capacity in modulating signals. In this paper, we propose three novel conditioning mechanisms for RNNs, tailored for black-box virtual analog modeling. These advanced conditioning mechanisms modulate the model based on control parameters, yielding superior results to existing RNN- and CNN-based architectures across various evaluation metrics.
Download Identification of Time-frequency Maps for sounds timbre discrimination Gabor Multipliers are signals operator which are diagonal in a time-frequency representation of signals and can be viewed as timefrequency transfer function. If we estimate a Gabor mask between a note played by two instruments, then we have a time-frequency representation of the difference of timbre between these two notes. By averaging the energy contained in the Gabor mask, we obtain a measure of this difference. In this context, our goal is to automatically localize the time-frequency regions responsible for such a timbre dissimilarity. This problem is addressed as a feature selection problem over the time-frequency coefficients of a labelled data set of sounds.