Download Spectutils, an audio signal analysis and visualization toolkit for GNU octave
Spectutils is a GNU Octave toolkit for analyzing and visualizing audio signals. Spectutils allows to display oscillograms, FFT spectrograms as well as pitch detection graphs. Spectutils can best be characterized as a user interface for GNU Octave, which integrates signal analysis and visualization functionality into dedicated function calls. Therefore, signal analysis with Spectutils requires little or no prior knowledge of Octave or MATLAB programming.
Download Center channel separation based on spatial analysis
This is a brief description of audio channel or sound source separation algorithm using spatial cues. Basically inter-channel level difference (ICLD) is used for discriminating sound sources in a spatial grid for each channel pair and analysis subband. Interchannel cross-correlation (ICC) is also used for determining sound source location area and contribution factor for the considering composite sound source. In this paper, the center and side channel separation of stereophonic music signal using spatial sound source discrimination method is introduced. This is simply implemented by using given information of center channel location and derived spatial cues. The separated center channel signal is well matched with separated side channels when reproduced simultaneously.
Download Detecting arrivals within room impulse responses using matching pursuit
This paper proposes to use Matching Pursuit, in order to investigate some statistical foundations of Room Acoustics, such as the temporal distribution of arrivals, and the estimation of mixing time. As this has never been experimentally explored, this study is a first step towards a validation of the ergodic theory of reverberation. The use of Matching Pursuit is implicit, since correlation between the impulse response and the direct sound is assumed. The compensation for the energy decay is necessary to obtain stationnary signals. Methods for determining the best the temporal boundaries of the direct sound, for choosing an appropriate stopping criteria based on the similarity between acoustical indices of the original RIR and those of the synthesized signal, and for experimentally defining the mixing time constitute the scope of this study.
Download Inferring the hand configuration from hand clapping sounds
In this paper, a technique for inferring the configuration of a clapper’s hands from a hand clapping sound is described. The method was developed based on analysis of synthetic and recorded hand clap sounds, labeled with the corresponding hand configurations. A naïve Bayes classifier was constructed to automatically classify the data using two different feature sets. The results indicate that the approach is applicable for inferring the hand configuration.
Download Multi-feature modeling of pulse clarity: Design, validation, and optimisation
Pulse clarity is considered as a high-level musical dimension that conveys how easily in a given musical piece, or a particular moment during that piece, listeners can perceive the underlying rhythmic or metrical pulsation. The objective of this study is to establish a composite model explaining pulse clarity judgments from the analysis of audio recordings, decomposed into a set of independent factors related to various musical dimensions. To evaluate the pulse clarity model, 25 participants have rated the pulse clarity of one hundred excerpts from movie soundtracks. The mapping between the model predictions and the ratings was carried out via regressions. More than three fourth of listeners’ rating variance can be explained with a combination of periodicity-based and nonperiodicity-based factors.
Download Acoustic features for music piece structure analysis
Automatic analysis of the structure of a music piece aims to recover its sectional form: segmentation to musical parts, such as chorus or verse, and detecting repeated occurrences. A music signal is here described with features that are assumed to deliver information about its structure: mel-frequency cepstral coefficients, chroma, and rhythmogram. The features can be focused on different time scales of the signal. Two distance measures are presented for comparing musical sections: “stripes” for detecting repeated feature sequences, and “blocks” for detecting homogenous sections. The features and their time scales are evaluated in a systemindependent manner. Based on the obtained information, the features and distance measures are evaluated in an automatic structure analysis system with a large music database with manually annotated structures. The evaluations show that in a realistic situation, feature combinations perform better than individual features.
Download An experimental comparison of time delay weights for direction of arrival estimation
When direction of arrival is estimated using time differences of arrival, the estimation accuracy is determined by the accuracy of time delay estimates. Probability of large errors increases in poor signal conditions and reverberant conditions pose a significant challenge. To overcome the problems, reliability criteria for time delays and weighted least squares direction estimation have been proposed. This work combines these approaches, and compares several weight criteria for single-frame estimation experimentally. Testing is conducted on different types of audio signals in a loudspeaker experiment. As a result, an optimum combination of weights is found, whose performance exceeds earlier proposals and iterated weighting. Furthermore, the optimum weighting is not dependent on the source signal type, and the best weights are the ones that do not require information about the underlying time delay estimator.
Download Identification of individual guitar sounds by support vector machines
This paper introduces an automatic classification system for the identification of individual classical guitars by single notes played on these guitars. The classification is performed by Support Vector Machines (SVM) that have been trained with the features of the single notes. The features used for classification were the time series of the partial tones, the time series of the MFCCs (Mel Frequency Cepstral Coefficients), and the “nontonal” contributions to the spectrum. The influences of these features on the classification success are reported. With this system, 80% of the sounds recorded with three different guitars were classified correctly. A supplementary classification experiment was carried out with human listeners resulting in a rate of 65% of correct classifications.
Download Automatic alignment of music audio and lyrics
This paper proposes an algorithm for aligning singing in polyphonic music audio with textual lyrics. As preprocessing, the system uses a voice separation algorithm based on melody transcription and sinusoidal modeling. The alignment is based on a hidden Markov model speech recognizer where the acoustic model is adapted to singing voice. The textual input is preprocessed to create a language model consisting of a sequence of phonemes, pauses and possible instrumental breaks. Viterbi algorithm is used to align the audio features with the text. On a test set consisting of 17 commercial recordings, the system achieves an average absolute error of 1.40 seconds in aligning lines of the lyrics.
Download Robustness and independence of voice timbre features under live performance acoustic degradations
Live performance situations can lead to degradations in the vocal signal from a typical microphone, such as ambient noise or echoes due to feedback. We investigate the robustness of continuousvalued timbre features measured on vocal signals (speech, singing, beatboxing) under simulated degradations. We also consider nonparametric dependencies between features, using information theoretic measures and a feature-selection algorithm. We discuss how robustness and independence issues reflect on the choice of acoustic features for use in constructing a continuous-valued vocal timbre space. While some measures (notably spectral crest factors) emerge as good candidates for such a task, others are poor, and some features such as ZCR exhibit an interaction with the type of voice signal being analysed.