Since the MIDI 1.0 specification [1], well over 15 years ago, many have been the attempts to give a solution to all the limitations that soon became clear. None of these have had a happy ending, mainly due to commercial interests and as a result, when trying to find an appropriate synthesis control user interface, we had not many choices but the use of MIDI. That's the reason why the idea of defining a new user interface aroused. In this article, the main components of this interface will be discussed, paying special attention to the advantages and new features it reports to the enduser.
Content description has become a topic of interest for many researchers in the audiovisual field [1][2]. While manual annotation has been used for many years in different applications, the focus now is on finding automatic contentextraction and content-navigation tools. An increasing number of projects, in some of which we are actively involved, focus on the extraction of meaningful features from an audio signal. Meanwhile, standards like MPEG7 [3] are trying to find a convenient way of describing audiovisual content. Nevertheless, content description is usually thought of as an additional information stream attached to the ‘actual content’ and the only envisioned scenario is that of a search and retrieval framework. However, in this article it will be argued that if there is a suitable content description, the actual content itself may no longer be needed and we can concentrate on transmitting only its description. Thus, the receiver should be able to interpret the information that, in the form of metadata, is available at its inputs, and synthesize new content relying only on this description. It is possibly in the music field where this last step has been further developed, and that fact allows us to think of such a transmission scheme being available on the near future.
This paper presents an application for performing melodic transformations to monophonic audio phrases. The system first extracts a melodic description from the audio. This description is presented to the user and can be stored and loaded in a MPEG-7 based format. A set of high-level transformations can then be applied to the melodic description. These high-level transformations are mapped into a set of low-level signal transformations and then applied to the audio signal. The algorithms for description extraction and audio transformation are also presented.