Download Effective Singing Voice Detection in Popular Music Using ARMA Filtering Locating singing voice segments is essential for convenient indexing, browsing and retrieval large music archives and catalogues. Furthermore, it is beneficial for automatic music transcription and annotations. The approach described in this paper uses Mel-Frequency Cepstral Coefficients in conjunction with Gaussian Mixture Models for discriminating two classes of data (instrumental music and singing voice with music background). Due to imperfect classification behavior, the categorization without additional post-processing tends to alternate within a very short time span, whereas singing voice tends to be continuous for several frames. Thus, various tests have been performed to identify a suitable decision function and corresponding smoothing methods. Results are reported by comparing the performance of straightforward likelihood based classifications vs. postprocessing with an autoregressive moving average filtering method.
Download Non-Linear Digital Implementation of a Parametric Analog Tube Ground Cathode Amplifier In this paper we propose a digital simulation of an analog amplifier circuit based on a grounded-cathode amplifier with parametric tube model. The time-domain solution enables the online valve model substitution and zero-latency changes in polarization parameters. The implementation also allows the user to match various types of tube processing features.
Download A Similarity Measure for Audio Query by Example Based on Perceptual Coding and Compression Query by example for multimedia signals aims at automatic retrieval of samples from the media database similar to a userprovided example. This paper proposes a similarity measure for query by example of audio signals. The method first represents audio signals using perceptual audio coding and second estimates the similarity of two signals from the advantage gained by compressing the files together in comparison to compressing them individually. Signals which benefit most from compressing together are considered similar. The low bit rate perceptual audio coding preprocessing effectively retains perceptually important features while quantizing the signals so that identical codewords appear, allowing further inter-signal compression. The advantage of the proposed similarity measure is that it is parameter-free, thus it is easy to apply in wide range of tasks. Furthermore, users’ expectations do not affect the results like they do in parameter-laden algorithms. A comparison was made against the other query by example methods and simulation results reveal that the proposed method gives competitive results against the other methods.
Download The REACTION System: Automatic Sound Segmentation and Word Spotting for Verbal Reaction Time Tests Reaction tests are typical tests from the field of psychological research and communication science in which a test person is presented some stimulus like a photo, a sound, or written words. The individual has to evaluate the stimulus as fast as possible in a predefined manner and has to react by presenting the result of the evaluation. This could be by pushing a button in simple reaction tests or by saying an answer in verbal reaction tests. The reaction time between the onset of the stimulus and the onset of the response can be used as a degree of difficulty for performing the given evaluation. Compared to simple reaction tests verbal reaction tests are very powerful since the individual can simply say the answer which is the most natural way of answering. The drawback for verbal reaction tests is that today the reaction times still have to be determined manually. This means that a person has to listen through all audio recordings taken during test sessions and mark stimuli times and word beginnings one by one which is very time consuming and people-intensive. To replace the manual evaluation of reaction tests this article presents the REACTION (Reaction Time Determination) system which can automatically determine the reaction times of a test session by analyzing the audio recording of the session. The system automatically detects the onsets of stimuli as well as the onsets of answers. The recording is furthermore segmented into parts each containing one stimulus and the following reaction which further facilitates the transcription of the spoken words for a semantic evaluation.
Download The Beating Equalizer and its Application to the Synthesis and Modification of Piano Tones This paper presents an improved method for simulating and modifying the beating effect in piano tones. The beating effect is an audible phenomenon, which is characteristic to the piano, and, hence, it should be accounted for in realistic piano synthesis. The proposed method, which is independent of the synthesis technique, contains a cascade of second-order equalizing filters, where each filter produces the beating effect for a single partial by modulating the peak gain. Moreover, the method offers a way to control the beating frequency and the beating depth, and it can be used to modify the beating envelope in existing tones. The results show that the proposed method is able to simulate the desired beating effect.
Download Simplified, Physically-Informed Models of Distortion and Overdrive Guitar Effects Pedals This paper explores a computationally efficient, physically informed approach to design algorithms for emulating guitar distortion circuits. Two iconic effects pedals are studied: the “Distortion” pedal and the “Tube Screamer” or “Overdrive” pedal. The primary distortion mechanism in both pedals is a diode clipper with an embedded low-pass filter, and is shown to follow a nonlinear ordinary differential equation whose solution is computationally expensive for real-time use. In the proposed method, a simplified model, comprising the cascade of a conditioning filter, memoryless nonlinearity and equalization filter, is chosen for its computationally efficient, numerically robust properties. Often, the design of distortion algorithms involves tuning the parameters of this filter-distortion-filter model by ear to match the sound of a prototype circuit. Here, the filter transfer functions and memoryless nonlinearities are derived by analysis of the prototype circuit. Comparisons of the resulting algorithms to actual pedals show good agreement and demonstrate that the efficient algorithms presented reproduce the general character of the modeled pedals.
Download Simulation of the Diode Limiter in Guitar Distortion Circuits by Numerical Solution of Ordinary Differential Equations The diode clipper circuit with an embedded low-pass filter lies at the heart of both diode clipping “Distortion” and “Overdrive” or “Tube Screamer” effects pedals. An accurate simulation of this circuit requires the solution of a nonlinear ordinary differential equation (ODE). Numerical methods with stiff stability – Backward Euler, Trapezoidal Rule, and second-order Backward Difference Formula – allow the use of relatively low sampling rates at the cost of accuracy and aliasing. However, these methods require iteration at each time step to solve a nonlinear equation, and the tradeoff for this complexity must be evaluated against simple explicit methods such as Forward Euler and fourth order Runge-Kutta, which require very high sampling rates for stability. This paper surveys and compares the basic ODE solvers as they apply to simulating circuits for audio processing. These methods are compared to a static nonlinearity with a pre-filter. It is found that implicit or semiimplicit solvers are preferred and that the filter/static nonlinearity approximation is often perceptually adequate.
Download A Generic System for Audio Indexing: Application to Speech/Music Segmentation and Music Genre Recognition In this paper we present a generic system for audio indexing (classification/ segmentation) and apply it to two usual problems: speech/ music segmentation and music genre recognition. We first present some requirements for the design of a generic system. The training part of it is based on a succession of four steps: feature extraction, feature selection, feature space transform and statistical modeling. We then propose several approaches for the indexing part depending of the local/ global characteristics of the indexes to be found. In particular we propose the use of segment-statistical models. The system is then applied to two usual problems. The first one is the speech/ music segmentation of a radio stream. The application is developed in a real industrial framework using real world categories and data. The performances obtained for the pure speech/ music classes problem are good. However when considering also the non-pure categories (mixed, bed) the performances of the system drop. The second problem is the music genre recognition. Since the indexes to be found are global, “segment-statistical models” are used leading to results close to the state of the art.
Download Analytical Features for the Classification of Percussive Sounds: The Case of the Pandeiro There is an increasing need for automatically classifying sounds for MIR and interactive music applications. In the context of supervised classification, we describe an approach that improves the performance of the general bag-of-frame scheme without loosing its generality. This method is based on the construction and exploitation of specific audio features, called analytical, as input to classifiers. These features are better, in a sense we define precisely than standard, general features, or even than ad hoc features designed by hand for specific problems. To construct these features, our method explores a very large space of functions, by composing basic operators in syntactically correct ways. These operators are taken from the Mathematical and Audio Processing domains. Our method allows us to build a large number of these features, evaluate and select them automatically for arbitrary audio classification problems. We present here a specific study concerning the analysis of Pandeiro (Brazilian tambourine) sounds. Two problems are considered: the classification of entire sounds, for MIR applications, and the classification of attacks portions of the sound only, for interactive music applications. We evaluate precisely the gain obtained by analytical features on these two problems, in comparison with standard approaches.
Download Automatic Music Detection in Television Productions This paper presents methods for the automatic detection of music within audio streams, in the fore- or background. The problem occurs in the context of a real-world application, namely, the analysis of TV productions w.r.t. the use of music. In contrast to plain speech/music discrimination, the problem of detecting music in TV productions is extremely difficult, since music is often used to accentuate scenes while concurrently speech and any kind of noise signals might be present. We present results of extensive experiments with a set of standard machine learning algorithms and standard features, investigate the difference between frame-level and clip-level features, and demonstrate the importance of the application of smoothing functions as a post-processing step. Finally, we propose a new feature, called Continuous Frequency Activation (CFA), especially designed for music detection, and show experimentally that this feature is more precise than the other approaches in identifying segments with music in audio streams.