Download Visualization and calculation of the roughness of acoustical musical signals using the Synchronization Index Model (SIM) The synchronization index model of sensory dissonance and roughness accounts for the degree of phase-locking to a particular frequency that is present in the neural patterns. Sensory dissonance (roughness) is defined as the energy of the relevant beating frequencies in the auditory channels with respect to the total energy. The model takes rate-code patterns at the level of the auditory nerve as input and outputs a sensory dissonance (roughness) value. The synchronization index model entails a straightforward visualization of the principles underlying sensory dissonance and roughness, in particular in terms of (i) roughness contributions with respect to cochlear mechanical filtering (on a Critical Band scale), and (ii) roughness contributions with respect to phase-locking synchrony (=the synchronization index for the relevant beating frequencies on a frequency scale). This paper presents the concept, and implementation of the synchronization index model and its application to musical scales.
Download Analysing auditory representations for sound classification with self-organising neural networks Three different auditory representations—Lyon’s cochlear model, Patterson’s gammatone filterbank combined with Meddis’ inner hair cell model, and mel-frequency cepstral coefficients—are analyzed in connection with self-organizing maps to evaluate their suitability for a perceptually justified classification of sounds. The self-organizing maps are trained with a uniform set of test sounds preprocessed by the auditory representations. The structure of the resulting feature maps and the trajectories of the individual sounds are visualized and compared to one another. While MFCC proved to be a very efficient representation, the gammatone model produced the most convincing results.
Download Blackboard system and top-down processing for the transcription of simple polyphonic music A system is proposed to perform the automatic music transcription of simple polyphonic tracks using top-down processing. It is composed of a blackboard system of three hierarchical levels, receiving its input from a segmentation routine in the form of an averaged STFT matrix. The blackboard contains a hypotheses database, a scheduler and knowledge sources, one of which is a neural network chord recogniser with the ability to reconfigure the operation of the system, allowing it to output more than one note hypothesis at a time. The basic implementation is explained, and some examples are provided to illustrate the performance of the system. The weaknesses of the current implementation are shown and next steps for further development of the system are defined.
Download Model-based synthesis and transformation of voiced sounds In this work a glottal model loosely based on the Ishizaka and Flanagan model is proposed, where the number of parameters is drastically reduced. First, the glottal excitation waveform is estimated, together with the vocal tract filter parameters, using inverse filtering techniques. Then the estimated waveform is used in order to identify the nonlinear glottal model, represented by a closedloop configuration of two blocks: a second order resonant filter, tuned with respect to the signal pitch, and a regressor-based functional, whose coefficients are estimated via nonlinear identification techniques. The results show that an accurate identification of real data can be achieved with less than regressors of the nonlinear functional, and that an intuitive control of fundamental features, such as pitch and intensity, is allowed by acting on the physically informed parameters of the model. 10
Download On the use of zero-crossing rate for an apllication of classification of percussive sounds We address the issue of automatically extracting rhythm descriptors from audio signals, to be eventually used in content-based musical applications such as in the context of MPEG7. Our aim is to approach the comprehension of auditory scenes in raw polyphonic audio signals without preliminary source separation. As a first step towards the automatic extraction of rhythmic structures out of signals taken from the popular music repertoire, we propose an approach for automatically extracting time indexes of occurrences of different percussive timbres in an audio signal. Within this framework, we found that a particular issue lies in the classification of percussive sounds. In this paper, we report on the method currently used to deal with this problem.
Download Interactive digital audio environments: gesture as a musical parameter This paper presents some possible relationships between gesture and sound that may be built with an interactive digital audio environment. In a traditional musical situation gesture usually produces sound. The relationship between gesture and sound is unique, it is a cause to effect link. In computer music, the possibility of uncoupling gesture from sound is due to the fact that computer can carry out all the aspects of sound production from composition up to interpretation and performance. Real time computing technology and development of human gesture tracking systems may enable gesture to be introduced again into the practice of computer music, but with a completely renewed approach. There is no more need to create direct cause to effect relationships for sound production, and gesture may be seen as another musical parameter to play with in the context of interactive musical performances.
Download A system for data-driven concatenative sound synthesis In speech synthesis, concatenative data-driven synthesis methods prevail. They use a database of recorded speech and a unit selection algorithm that selects the segments that match best the utterance to be synthesized. Transferring these ideas to musical sound synthesis allows a new method of high quality sound synthesis. Usual synthesis methods are based on a model of the sound signal. It is very difficult to build a model that would preserve the entire fine details of sound. Concatenative synthesis achieves this by using actual recordings. This data-driven approach (as opposed to a rule-based approach) takes advantage of the information contained in the many sound recordings. For example, very naturally sounding transitions can be synthesized, since unit selection is aware of the context of the database units. The C ATERPILLAR software system has been developed to allow data-driven concatenative unit selection sound synthesis. It allows high-quality instrument synthesis with high level control, explorative free synthesis from arbitrary sound databases, or resynthesis of a recording with sounds from the database. It is based on the new software-engineering concept of component-oriented software, increasing flexibility and facilitating reuse.