Download Automatic Recognition of Cascaded Guitar Effects
This paper reports on a new multi-label classification task for guitar effect recognition that is closer to the actual use case of guitar effect pedals. To generate the dataset, we used multiple clean guitar audio datasets and applied various combinations of 13 commonly used guitar effects. We compared four neural network structures: a simple Multi-Layer Perceptron as a baseline, ResNet models, a CRNN model, and a sample-level CNN model. The ResNet models achieved the best performance in terms of accuracy and robustness under various setups (with or without clean audio, seen or unseen dataset), with a micro F1 of 0.876 and Macro F1 of 0.906 in the hardest setup. An ablation study on the ResNet models further indicates the necessary model complexity for the task.
Download Spatial Track Transition Effects for Headphone Listening
In this paper we study the use of different spatial processing techniques to create audio effects for forced transitions between music tracks in headphone listening. The audio effect encompasses a movement of the initially playing track to the side of the listener while the next track to be played moves into a central position simultaneously. We compare seven different methods for creating this effect in a listening test where the task of the user is to characterize the span of the spatial movement of audio play list items around the listener’s head. The methods used range from amplitude panning up to full Head Related Transfer Function (HRTF) rendering. It is found that a computationally efficient method using time-varying interaural time differences is equally effective in creating a large spatial span as the full HRTF rendering method.
Download Distribution Derivative Method for Generalised Sinusoid with Complex Amplitude Modulation
The most common sinusoidal models for non-stationary analysis represent either complex amplitude modulated exponentials with exponential damping (cPACED) or log-amplitude/frequency modulated exponentials (generalised sinusoids), by far the most commonly used modulation function being polynomials for both signal families. Attempts to tackle a hybrid sinusoidal model, i.e. a generalised sinusoid with complex amplitude modulation were relying on approximations and iterative improvement due to absence of a tractable analytical expression for their Fourier Transform. In this work a simple, direct solution for the aforementioned model is presented.
Download Detection of Room Reflections from a Binaural Room Impulse Response
A novel analysis method for binaural room impulse responses (BRIRs) is presented. It is based on the analysis of ear canal signals with continuous wavelet transform (CWT). Then, the crosswavelet transform (XWT) is used for detection of the direct sound and individual reflections from a BRIR. The new method seems to time-localize the reflections quite accurately. In addition, the proposed analysis method enables detailed study of the frequency content of the early reflections. The algorithm is tested with both measured and modeled impulse responses. A comparison with an FFT-based cross-spectrogram is made. The results show that XWT has potential in audio signal analysis.
Download Synthesis of a Macro Sound Structure within a Self Organizing System
This paper is focused on synthesizing macro-sound structures with certain ecological attributes to obtain perceptually interesting and compositionally useful results. The system, which delivers the sonic result is designed as a self organizing system. Certain principles of cybernetics are critically assessed in the paper in terms of interdependencies among system components, system dynamics and the system/environment coupling. It is aiming towards a self evolution of an ecological kind, applying an interactive exchange with its external conditions. The macro-organization of the sonic material is a result of interactions of events at a meso and micro level but also this exchange with its environment. The goal is to formulate some new principles and present its sketches here by arriving to a network of concepts suggesting new ideas in sound synthesis.
Download A system for data-driven concatenative sound synthesis
In speech synthesis, concatenative data-driven synthesis methods prevail. They use a database of recorded speech and a unit selection algorithm that selects the segments that match best the utterance to be synthesized. Transferring these ideas to musical sound synthesis allows a new method of high quality sound synthesis. Usual synthesis methods are based on a model of the sound signal. It is very difficult to build a model that would preserve the entire fine details of sound. Concatenative synthesis achieves this by using actual recordings. This data-driven approach (as opposed to a rule-based approach) takes advantage of the information contained in the many sound recordings. For example, very naturally sounding transitions can be synthesized, since unit selection is aware of the context of the database units. The C ATERPILLAR software system has been developed to allow data-driven concatenative unit selection sound synthesis. It allows high-quality instrument synthesis with high level control, explorative free synthesis from arbitrary sound databases, or resynthesis of a recording with sounds from the database. It is based on the new software-engineering concept of component-oriented software, increasing flexibility and facilitating reuse.
Download Exploring Phase Information in Sound Source Separation Applications
Separation of instrument sounds from polyphonic music recordings is a desirable signal processing function with a wide variety of applications in music production, video games and information retrieval. In general, sound source separation algorithms attempt to exploit those characteristics of audio signals that differentiate one from the other. Many algorithms have studied spectral magnitude as a means for separation tasks. Here we propose the exploration of phase information of musical instrument signals as an alternative dimension in discriminating sound signals originating from different sources. Three cases are presented: (1) Phase contours of musical instruments notes as potential separation features. (2) Resolving overlapping harmonics using phase coupling properties of musical instruments. (3) Harmonic percussive decomposition using calculated radian ranges for each frequency bin.
Download Automated Calibration of a Parametric Spring Reverb Model
The calibration of a digital spring reverberator model is crucial for the authenticity and quality of the sound produced by the model. In this paper, an automated calibration of the model parameters is proposed, by analysing the spectrogram, the energy decay curve, the spectrum, and the autocorrelation of the time signal and spectrogram. A visual inspection of the spectrograms as well as a comparison of sound samples proves the approach to be successful for estimating the parameters of reverberators with one, two and three springs. This indicates that the proposed method is a viable alternative to manual calibration of spring reverberator models.
Download Using Visual Textures for Sonic Textures Production and Control
This work takes place in the framework of a global research on the synthesis of sonic textures and its control through a gesturebased interaction in a musical practice. In this paper we present different strategies to link visual and sonic textures using similar synthesis processes; theoretical considerations underlying to this problematic are firstly exposed and several personal realizations, illustrating different approaches to design a gesturally controlled audio-visual system, are then described.
Download Similarity-based Sound Source Localization with a Coincident Microphone Array
This paper presents a robust, accurate sound source localization method using a compact, near-coincident microphone array. We derive features by combining the microphone signals and determine the direction of a single sound source by similarity matching. Therefore, the observed features are compared with a set of previously measured reference features, which are stored in a look-up table. By proper processing in the similarity domain, we are able to deal with signal pauses and low SNR without the need of a separate detection algorithm. For practical evaluation, we made recordings of speech signals (both loudspeaker-playback and human speaker) with a planar 4-channel prototype array in a medium-sized room. The proposed approach clearly outperforms existing coincident localization methods. We achieve high accuracy (2◦ mean absolute azimuth error at 0 dB SNR) for static sources, while being able to quickly follow rapid source angle changes.