Download Resynthesis of coupled piano strings vibrations based on physical modeling
This paper presents a technique to resynthesize the sound generated by the vibrations of two piano strings tuned to a very close pitch and coupled at the bridge level. Such a mechanical system produces doublets of components generating beats and double decays on the amplitudes of the partials of the sound. We design a waveguide model by coupling two elementary waveguide models. This model is able to reproduce perceptually relevant sounds. The parameters of the model are estimated from the analysis of real signals collected directly on the strings by laser velocimetry. Sound transformations can be achieved by modifying relevant parameters and simulate physical situations.
Download Low bit-rate audio coding with hybrid representations
We present a general audio coder based on a structural decomposition : the signal is expanded into three features : its harmonic part, the transients and the remaining part (referred as the noise). The rst two of these layers can be very eciently encoded in a wellchosen basis. The noise is by construction modelized as a gaussian (colored) random noise. Furthermore, this decomposition allows a good time-frequency psycoacoustic modeling, as it dircetly provides us with the tonal and nontonal part of the signal.
Download Wavelet based Method for audio-video synchronization in broadcasting applications
The difference between standards used for films and for video generates problems when a conversion from one format to another is required : Since all the images are displayed, the change of frame rate induces a pitch change on the sound. To avoid this problem, the whole soundtrack has to be processed during the duplication. In this paper, we address the corresponding sound transformation problem, namely the dilation of the sound spectrum without changing its duration. For broadcasting applications, the ratio of transposition is within the range 24/25-25/24. The wide variety of sounds (music, speech, noise…) used in movies led us to first construct a database of representative sounds containing both transient, noisy and quasiperiodic sounds. This database has been used to compare the performances of different approaches. The reviewing of the most well known methods clearly shows significant disparities between them according to the class of the signal. This led us to reconsider the problem and to propose methods based on wavelet transforms.
Download Digital Audio Effects in the Wavelet Domain
Audio signals are often stored or transmitted in a compressed representation, which can pose a problem if there is a requirement to perform signal processing; it is likely it will be necessary to convert the signal back to a time domain representation, process, and then re-transform. This is timeconsuming and computationally intensive; it is potentially more efficient to apply signal processing while the signal remains in the transform domain. We have implemented a scheme whereby linear processing of the traditional type often instinctively understood by those working in the audio field may be applied to signals stored in a wavelet domain representation. Results are presented which demonstrate that the method produces the same output – to within the limits of machine precision – as timedomain processing, for less computational effort than would be required for the full explicit process through the time domain and back again. The potential benefits for linear effects processing (for example, EQ and sample-level delays and echoes) and also for non-linear processing such as dynamics processing, will be introduced and discussed.
Download Direct estimation of frequency from MDCT-encoded files
The Modified Discrete Cosine Transform (MDCT) is a broadlyused transform for audio coding, since it allows an orthogonal time-frequency transform without blocking effects. In this article, we show that the MDCT can also be used as an analysis tool. This is illustrated by extracting the frequency of a pure sine wave with some simple combinations of MDCT coefficients. We studied the performance of this estimation in ideal (noiseless) conditions, as well as the influence of additive noise (white noise / quantization noise). This forms the basis of a low-level feature extraction directly in the compressed domain.
Download Sparse and Structured Decompositions of Audio Signals in Overcomplete Spaces
We investigate the notion of “sparse decompositions” of audio signals in overcomplete spaces, ie when the number of basis functions is greater than the number of signal samples. We show that, with a low degree of overcompleteness (typically 2 or 3 times), it is possible to get good approximation of the signal that are sparse, provided that some “structural” information is taken into account, ie the localization of significant coefficients that appears to form clusters. This is illustrated with decompositions on a union of local cosines (MDCT) and discrete wavelets (DWT), that are shown to perform well on percussive signals, a class of signals that is difficult to sparsely represent on pure (local) Fourier bases. Finally, the obtained clusters of individuals atoms are shown to carry higher levels of information, such as a parametrization of partials or attacks, and this is potentially useful in an information retrieval context.
Download Playing cylinders of mechanical organs with an optical reader
This study presents an experimental setup designed to read, by means of optical techniques, the music inscribed on automatic organ cylinders. We describe the acquisition principle based on images taken by a CCD linear camera, and the various digital signal processing techniques employed to retrieve the music from the images. Preliminary results show that this original method is a relevant choice, since on our test cylinder about 90 % of the notes are correctly identified, with only 14 % of false alarms. However, for realistic estimates of the actual music, some improvements are still necessary, both in the experimental setup and in the way individual note positions are converted into music.
Download Representations of Audio Signals in Overcomplete Dictionaries: What is the Link Between Redundancy Factor and Coding Properties?
This paper addresses the link between the size of the dictionary in overcomplete decompositions of signals and the rate-distortion properties when such decompositions are used for audio coding. We have performed several experiments with sets of nested dictionaries showing that very redundant shift-invariant and multi-scale dictionaries have a clear benefit at low bit-rates ; however for very low distortion a lot of atoms have to be encoded, in these cases orthogonal transforms such as the MDCT give better results.
Download Object Coding of Harmonic Sounds Using Sparse and Structured Representations
Object coding allows audio compression at extremely low bit-rates, provided that the objects are correctly modelled and identified. In this study, a codec has been implemented on the basis of a sparse decomposition of the signal with a dictionary of InstrumentSpecific Harmonic atoms. The decomposition algorithm extracts “molecules” i.e. linear combinations of such atoms, considered as note-like objects. Thus, they can be coded efficiently using notespecific strategies. For signals containing only harmonic sounds, the obtained bitrates are very low, typically around 2 kbs, and informal listening tests against a standard sinusoidal coder show promising performances.
Download Inverting dynamics compression with minimal side information
Dynamics processing is a widespread technique, both at music production and diffusion stages. In particular, dynamic compression is often used in such a way that the “average” listener can best enjoy the music. However, this may lead to an excessive use of compression, especially with respect to listeners in quiet listening conditions. This paper presents estimates on the amount of extra data that is needed to invert the effects of such non-linear processing, using simple blind identification techniques. We present two simple test cases, first in the case when perfect reconstruction is needed, and second when the ancillary data rate is constrained, leading to an approximate reconstruction.