Download Unsupervised Feature Learning for Speech and Music Detection in Radio Broadcasts
Detecting speech and music is an elementary step in extracting information from radio broadcasts. Existing solutions either rely on general-purpose audio features, or build on features specifically engineered for the task. Interpreting spectrograms as images, we can apply unsupervised feature learning methods from computer vision instead. In this work, we show that features learned by a mean-covariance Restricted Boltzmann Machine partly resemble engineered features, but outperform three hand-crafted feature sets in speech and music detection on a large corpus of radio recordings. Our results demonstrate that unsupervised learning is a powerful alternative to knowledge engineering.
Download The development of an online course in DSP eartraining
The authors present a collaborative effort on establishing an online course in DSP eartraining. The paper reports from a preliminary workshop that covered a large range of topics such as eartraining in music education, terminology for sound characterization, e-learning, automated tutoring, DSP techniques, music examples and audio programming. An initial design of the web application is presented as a rich content database with flexible views to allow customized online presentations. Technical risks have already been mitigated through prototyping.
Download The Simplest Analysis Method for Non-Stationary Sinusoidal Modeling
This paper introduces an analysis method based on the generalization of the phase vocoder approach to non-stationary sinusoidal modeling. This new method is then compared to the reassignment method for the estimation of all the parameters of the model (phase, amplitude, frequency, amplitude modulation, and frequency modulation), and to the Cramér-Rao bounds. It turns out that this method compares to the state of the art in terms of performances, with the great advantage of being much simpler.
Download A pickup model for the Clavinet
In this paper recent findings on magnetic transducers are applied to the analysis and modeling of Clavinet pickups. The Clavinet is a stringed instrument having similarities to the electric guitar, it has magnetic single coil pickups used to transduce the string vibration to an electrical quantity. Data gathered during physical inspection and electrical measurements are used to build a complete model which accounts for nonlinearities in the magnetic flux. The model is inserted in a Digital Waveguide (DWG) model for the Clavinet string for its evaluation.
Download Scattering Representation of Modulated Sounds
Mel-frequency spectral coefficients (MFSCs), calculated by averaging the spectrogram along a mel-frequency scale, are used in many audio classification tasks. Their efficiency can be partly explained by their stability to deformation in a Euclidean norm. However, averaging the spectrogram loses high-frequency information. This loss is reduced by keeping the window size small, around 20 ms, which in turn prevents MFSCs from capturing largescale structures. Scattering coefficients recover part of this lost information using a cascade of wavelet decompositions and modulus operators, enabling larger window sizes. This representation is sufficiently rich to capture note attacks, amplitude and frequency modulation, as well as chord structure.
Download Musical Aspects of Vowel Formants in the Extreme Metal Voice
Download VST Plug-in Module Performing Wavelet Transform in Real-time
The paper presents a variant of the segmentwise wavelet transform (blockwise DWT, online DWT or SegDWT) algorithm adapted to real-time audio processing. The implementation of the algorithm as a VST plugin is presented as well. The main problem of segmentwise wavelet coefficient processing is the handling of the segment borders. The common border extension methods result in “false” coefficients, which in turn result in border distortion (block-end effects) after particular types of coefficient processing. In contrast, the SegDWT algorithm employs a segment extension technique to prevent this inconvenience and produce exactly the same coefficients as the wavelet transform of the whole signal would do. In this paper we remove some of the shortcomings of the original SegDWT algorithm; for example the need for the “right” segment extension is canceled. The VST plugin module created is described from the viewpoints of both the user and the programmer; the latter can easily add their own method for processing the coefficients.