Download Application of non-negative matrix factorization to signal-adaptive audio effects
This paper proposes novel audio effects based on manipulating an audio signal in a representation domain provided by non-negative matrix factorization (NMF). Critical-band magnitude spectrograms Y of sounds are first factorized into a product of two lower-rank matrices so that Y ≈ BG. The parameter matrices B and G are then processed in order to achieve the desired effect. Three classes of effects were investigated: 1) dynamic range compression (or expansion) of the component spectra or gains, 2) effects based on rank-ordering the components (colums of B and the corresponding rows of G) according to acoustic features extracted from them, and then weighting each component according to its rank, and 3) distortion effects based on controlling the amount of components (and thus the reconstruction error) in the above linear approximation. The subjective quality of the effects was assessed in a listening test.
Download Blind Separation of Monaural Signals using Complex Wavelets
In this paper, a new method of blind source separation of monaural signals is presented. It is based on similarity criteria between envelopes and frequency trajectories of the components of the signal, and on its onset and offset times. The main difference with previous works is that in this paper, the input signal has been filtered using a flexible complex band pass filter bank that is a discrete version of the Complex Continuous Wavelet Transform (CCWT). Our main purpose is to show that the CCWT can be a powerful tool in blind separation, due to its strong coherence in both time and frequency domains. The presented separation algorithm is a first approximation to this important task. An example set of four synthetically mixed monaural signals have been analyzed by this method. The obtained results are promising.
Download On the use of zero-crossing rate for an apllication of classification of percussive sounds
We address the issue of automatically extracting rhythm descriptors from audio signals, to be eventually used in content-based musical applications such as in the context of MPEG7. Our aim is to approach the comprehension of auditory scenes in raw polyphonic audio signals without preliminary source separation. As a first step towards the automatic extraction of rhythmic structures out of signals taken from the popular music repertoire, we propose an approach for automatically extracting time indexes of occurrences of different percussive timbres in an audio signal. Within this framework, we found that a particular issue lies in the classification of percussive sounds. In this paper, we report on the method currently used to deal with this problem.
Download Effective Separation of Low-Pitch Notes Using NMF Using Non-Power-of-2 Discrete Fourier Transforms
Recently, non-negative matrix factorization (NMF), which is applied to decompose signals in frequency domain by means of short-time Fourier transform (STFT), is widely used in audio source separation. Separation of low-pitch notes in recordings is of significant interest. According to time-frequency uncertainty principle, it may suffer from the tradeoff between time and frequency localizations for low-pitch sounds. Furthermore, because the window function applied to the signal causes frequency spreading, separation of low-pitch notes becomes more difficult. Instead of using power-of-2 FFT, we experiment on STFT sizes corresponding to the pitches of the notes in the signals. Computer simulations using synthetic signals show that the Source to Interferences Ratio (SIR) is significantly improved without sacrificing Sources to Artifacts Ratio (SAR) and Source to Distortion Ratio (SDR). In average, at least 2 to 6 dB improvement in SIR is achieved when compared to power-of-2 FFT of similar sizes.
Download TELTPC Based Re-Synthesis Method for Isolated Notes of Polyphonic Instrumental Music Recordings
In this paper, we presented a flexible analysis/re-synthesis method for smoothly changing the properties of isolated notes in polyphonic instrumental music recordings. True Envelope Linear Predictive Coding (TELPC) method has been employed as the analysis/synthesis model in order to preserve the original timbre quality as much as possible due to its accurate spectral envelope estimation. We modified the conventional LPC analysis/synthesis processing by using pitch synchronous analysis frames to avoid the severe magnitude modulation problem. Smaller frames can thus be used to capture more local characteristics of the original signals to further improve the sound quality. In this framework, one can manipulate a sequence of isolated notes from two commercially available polyphonic instrumental music recordings and interesting re-synthesized results are achieved.
Download A Preliminary Analysis of the Continuous Axis Value of the Threedimensional PAD Speech Emotional State Mode
The traditional way of emotional classification involves using the two-dimensional (2D) emotional model by Thayer, which identifies emotion by arousal and valence. The 2D model is not fine enough to classify among the rich vocabularies of emotions, such as distinguish between disgusting and fear. Another problem of the traditional methods is that they don’t have a formal definition of the axis value of the emotional model. They either assign the axis value manually or rate them by listening test. We propose to use the PAD (Pleasure, Arousal, Dominance) emotional state model to describe speech emotion in a continuous 3-dimensional scale. We suggest an initial definition of the continuous axis values by observing into the pattern of Log Frequency Power Coefficients (LFPC) fluctuation. We verify the result using a database of German emotional speech. Experiments show that the classification result of a set of big-6 emotions on average is 81%.
Download Time Scale Modification of Audio Using Non-Negative Matrix Factorization
This paper introduces an algorithm for time-scale modification of audio signals based on using non-negative matrix factorization. The activation signals attributed to the detected components are used for identifying sound events. The segmentation of these events is used for detecting and preserving transients. In addition, the algorithm introduces the possibility of preserving the envelopes of overlapping sound events while globally modifying the duration of an audio clip.
Download Audio Transport: A Generalized Portamento via Optimal Transport
This paper proposes a new method to interpolate between two audio signals. As an interpolation parameter is changed, the pitches in one signal slide to the pitches in the other, producing a portamento, or musical glide. The assignment of pitches in one sound to pitches in the other is accomplished by solving a 1-dimensional optimal transport problem. In addition, we introduce several techniques that preserve the audio fidelity over this highly nonlinear transformation. A portamento is a natural way for a musician to transition between notes, but traditionally it has only been possible for instruments with a continuously variable pitch like the human voice or the violin. Audio transport extends the portamento to any instrument, even polyphonic ones. Moreover, the effect can be used to transition between different instruments, groups of instruments, or any other pair of audio signals. The audio transport effect operates in real-time; we provide an open-source implementation. In experiments with sinusoidal inputs, the interpolating effect is indistinguishable from ideal sine sweeps. More generally, the effect produces clear, musical results for a wide variety of inputs.
Download A Hybrid Approach to Musical Note Onset Detection
Common problems with current methods of musical note onset detection are detection of fast passages of musical audio, detection of all onsets within a passage with a strong dynamic range and detection of onsets of varying types, such as multi-instrumental music. We present a method that uses a subband decomposition approach to onset detection. An energy-based detector is used on the upper subbands to detect strong transient events. This yields precision in the time resolution of the onsets, but does not detect softer or weaker onsets. A frequency based distance measure is formulated for use with the lower subbands, improving detection accuracy of softer onsets. We also present a method for improving the detection function, by using a smoothed difference metric. Finally, we show that the detection threshold may be set automatically from analysis of the statistics of the detection function, with results comparable in most places to manual setting of thresholds.
Download A new estimation technique for determining the control parameters of a physical model of a trumpet
A new estimation technique is proposed which computes the control parameters of a physical model of a trumpet in order to simulate a recording of a real instrument. First, the physical constraints of the instrument and the prior knowledge about how a player controls a trumpet are described. This is taken into account during the design of the data set and guarantees that these constraints are respected. Then, an estimation procedure minimizes two perceptual similarity criteria in function of the control parameters. The first criterium expresses the difference of the spectral envelopes and the second one the difference in fundamental frequency. An optimization technique is proposed that yields an optimal solution for the fundamental frequency, and a conditional suboptimal solution for the spectral envelope. A robust implementation of the technique was developed for which it is shown that the estimated parameters are unique and that the optimization does not suffer from local minima.