Download Adaptive Network-Based Fuzzy Inference System for Automatic Speech/Music Discrimination Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents an effective approach based on an Adaptive Network-Based Fuzzy Inference System (ANFIS) for the classification stage required in a speech/music discrimination system. A new simple feature, called Warped LPC-based Spectral Centroid (WLPC-SC), is also proposed. Comparison between WLPC-SC and some of the classical features proposed in [11] is performed, aiming to assess the good discriminatory power of the proposed feature. The length of the vector for describing the proposed psychoacoustic-based feature is reduced to a few statistical values (mean, variance and skewness). To evaluate the performance of the ANFIS system for speech/music discrimination, comparison to other commonly used classifiers is reported. The classification results for different types of music and speech show the good discriminating power of the proposed approach.
Download LTFATPY: Towards Making a Wide Range of Time-Frequency Representations Available in Python LTFATPY is a software package for accessing the Large Time Frequency Analysis Toolbox (LTFAT) from Python. Dedicated to time-frequency analysis, LTFAT comprises a large number of linear transforms for Fourier, Gabor, and wavelet analysis along with their associated operators. Its filter bank module is a collection of computational routines for finite impulse response and band-limited filters, allowing for the specification of constant-Q and auditory-inspired transforms. While LTFAT has originally been written in MATLAB/GNU Octave, the recent popularity of the Python programming language in related fields, such as signal processing and machine learning, makes it desirable to have LTFAT available in Python as well. We introduce LTFATPY, describe its main features, and outline further developments.
Download Blind Arbitrary Reverb Matching Reverb provides psychoacoustic cues that convey information concerning relative locations within an acoustical space. The need
arises often in audio production to impart an acoustic context on an
audio track that resembles a reference track. One tool for making
audio tracks appear to be recorded in the same space is by applying
reverb to a dry track that is similar to the reverb in a wet one. This
paper presents a model for the task of “reverb matching,” where
we attempt to automatically add artificial reverb to a track, making
it sound like it was recorded in the same space as a reference track.
We propose a model architecture for performing reverb matching
and provide subjective experimental results suggesting that the reverb matching model can perform as well as a human. We also
provide open source software for generating training data using an
arbitrary Virtual Studio Technology plug-in.
Download Generalised Prior Subspace Analysis for Polyphonic Pitch Transcription A reformulation of Prior Subspace Analysis (PSA) is presented, which restates the problem as that of fitting an undercomplete signal dictionary to a spectrogram. Further, a generalization of PSA is derived which allows the transcription of polyphonic pitched instruments. This involves the translation of a single frequency prior subspace of a note to approximate other notes, overcoming the problem of needing a separate basis function for each note played by an instrument. Examples are then demonstrated which show the utility of the generalised PSA algorithm for the purposes of polyphonic pitch transcription.
Download Source Filter Model For Expressive Gu-Qin Synthesis and its iOS App Gu-Qin as a venerable Chinese plucked-string instrument has its unique performance techniques and enchanting sounds. It is on the UNESCO Representative List of the Intangible Cultural Heritage of Humanity. It is one of the oldest Chinese solo instruments. The variation of Gu-Qin sound is so large that carefullydesigned controls of its computer synthesizer are necessary. We developed a parametric source-filter model for re-synthesizing expressive Gu-Qin notes. It is capable to cover as many as possible combinations of Gu-Qin’s performance techniques. In this paper, a brief discussion of Gu-Qin playing and its special tablature notation are made for understanding the relationship between its performance techniques and its sounds. This work includes a Gu-Qin’s musical notation system and a source-filter model based synthesizer. In addition, we implement an iOS app to demonstrate its low computation complexity and robustness. It is easy to perform improvisation of the sounds because of its friendly user interfaces.
Download Hierarchical Organization and Visualization of Drum Sample Libraries Drum samples are an important ingredient for many styles of music. Large libraries of drum sounds are readily available. However, their value is limited by the ways in which users can explore them to retrieve sounds. Available organization schemes rely on cumbersome manual classification. In this paper, we present a new approach for automatically structuring and visualizing large sample libraries through audio signal analysis. In particular, we present a hierarchical user interface for efficient exploration and retrieval based on a computational model of similarity and self-organizing maps.
Download Optimization of Cascaded Parametric Peak and Shelving Filters With Backpropagation Algorithm Peak and shelving filters are parametric infinite impulse response
filters which are used for amplifying or attenuating a certain frequency band. Shelving filters are parametrized by their cut-off frequency and gain, and peak filters by center frequency, bandwidth
and gain. Such filters can be cascaded in order to perform audio processing tasks like equalization, spectral shaping and modelling of complex transfer functions. Such a filter cascade allows
independent optimization of the mentioned parameters of each filter. For this purpose, a novel approach is proposed for deriving
the necessary local gradients with respect to the control parameters and for applying the instantaneous backpropagation algorithm
to deduce the gradient flow through a cascaded structure. Additionally, the performance of such a filter cascade adapted with the
proposed method, is exhibited for head-related transfer function
modelling, as an example application.
Download Audio Effect Chain Estimation and Dry Signal Recovery From Multi-Effect-Processed Musical Signals In this paper we propose a method that can address a novel task, audio effect (AFX) chain estimation and dry signal recovery. AFXs are indispensable in modern sound design workflows. Sound engineers often cascade different AFXs (as an AFX chain) to achieve their desired soundscapes. Given a multi-AFX-applied solo instrument performance (wet signal), our method can automatically estimate the applied AFX chain and recover its unprocessed dry signal, while previous research only addresses one of them. The estimated chain is useful for novice engineers in learning practical usages of AFXs, and the recovered signal can be reused with a different AFX chain. To solve this task, we first develop a deep neural network model that estimates the last-applied AFX and undoes its AFX at a time. We then iteratively apply the same model to estimate the AFX chain and eventually recover the dry signal from the wet signal. Our experiments on guitar phrase recordings with various AFX chains demonstrate the validity of our method for both the AFX-chain estimation and dry signal recovery. We also confirm that the input wet signal can be reproduced by applying the estimated AFX chain to the recovered dry signal.
Download A Diffusion-Based Generative Equalizer for Music Restoration This paper presents a novel approach to audio restoration, focusing on the enhancement of low-quality music recordings, and in particular historical ones. Building upon a previous algorithm called BABE, or Blind Audio Bandwidth Extension, we introduce BABE-2, which presents a series of improvements. This research broadens the concept of bandwidth extension to generative equalization, a task that, to the best of our knowledge, has not been previously addressed for music restoration. BABE-2 is built around an optimization algorithm utilizing priors from diffusion models, which are trained or fine-tuned using a curated set of high-quality music tracks. The algorithm simultaneously performs two critical tasks: estimation of the filter degradation magnitude response and hallucination of the restored audio. The proposed method is objectively evaluated on historical piano recordings, showing an enhancement over the prior version. The method yields similarly impressive results in rejuvenating the works of renowned vocalists Enrico Caruso and Nellie Melba. This research represents an advancement in the practical restoration of historical music. Historical music restoration examples are available at: research.spa.aalto.fi/publications/papers/dafx-babe2/.
Download Nonlinear time series analysis of musical signals In this work the techniques of chaotic time series analysis are applied to music. The audio stream from musical recordings are treated as representing experimental data from a dynamical system. Several performance of well-known classical pieces are analysed using recurrence analysis, stationarity measures, information metrics, and other time series based approaches. The benefits of such analysis are reported.