Download Expressive Piano Performance Rendering from Unpaired Data Recent advances in data-driven expressive performance rendering have enabled automatic models to reproduce the characteristics and the variability of human performances of musical compositions. However, these models need to be trained with aligned pairs of scores and performances and they rely notably on score-specific markings, which limits their scope of application. This work tackles the piano performance rendering task in a low-informed setting by only considering the score note information and without aligned data. The proposed model relies on an adversarial training where the basic score notes properties are modified in order to reproduce the expressive qualities contained in a dataset of real performances. First results for unaligned score-to-performance rendering are presented through a conducted listening test. While the interpretation quality is not on par with highly-supervised methods and human renditions, our method shows promising results for transferring realistic expressivity into scores.
Download LTFATPY: Towards Making a Wide Range of Time-Frequency Representations Available in Python LTFATPY is a software package for accessing the Large Time Frequency Analysis Toolbox (LTFAT) from Python. Dedicated to time-frequency analysis, LTFAT comprises a large number of linear transforms for Fourier, Gabor, and wavelet analysis along with their associated operators. Its filter bank module is a collection of computational routines for finite impulse response and band-limited filters, allowing for the specification of constant-Q and auditory-inspired transforms. While LTFAT has originally been written in MATLAB/GNU Octave, the recent popularity of the Python programming language in related fields, such as signal processing and machine learning, makes it desirable to have LTFAT available in Python as well. We introduce LTFATPY, describe its main features, and outline further developments.
Download Differentiable Attenuation Filters for Feedback Delay Networks We introduce a novel method for designing attenuation filters in
digital audio reverberation systems based on Feedback Delay Networks (FDNs). Our approach uses Second Order Sections (SOS)
of Infinite Impulse Response (IIR) filters arranged as parametric
equalizers (PEQ), enabling fine control over frequency-dependent
reverberation decay. Unlike traditional graphic equalizer designs,
which require numerous filters per delay line, we propose a scalable solution where the number of filters can be adjusted. The frequency, gain, and quality factor (Q) parameters are shared parameters across delay lines and only the gain is adjusted based on delay
length. This design not only reduces the number of optimization
parameters, but also remains fully differentiable and compatible
with gradient-based learning frameworks. Leveraging principles
of analog filter design, our method allows for efficient and accurate filter fitting using supervised learning. Our method delivers
a flexible and differentiable design, achieving state-of-the-art performance while significantly reducing computational cost.
Download A pickup model for the Clavinet In this paper recent findings on magnetic transducers are applied to the analysis and modeling of Clavinet pickups. The Clavinet is a stringed instrument having similarities to the electric guitar, it has magnetic single coil pickups used to transduce the string vibration to an electrical quantity. Data gathered during physical inspection and electrical measurements are used to build a complete model which accounts for nonlinearities in the magnetic flux. The model is inserted in a Digital Waveguide (DWG) model for the Clavinet string for its evaluation.
Download Scattering Representation of Modulated Sounds Mel-frequency spectral coefficients (MFSCs), calculated by averaging the spectrogram along a mel-frequency scale, are used in many audio classification tasks. Their efficiency can be partly explained by their stability to deformation in a Euclidean norm. However, averaging the spectrogram loses high-frequency information. This loss is reduced by keeping the window size small, around 20 ms, which in turn prevents MFSCs from capturing largescale structures. Scattering coefficients recover part of this lost information using a cascade of wavelet decompositions and modulus operators, enabling larger window sizes. This representation is sufficiently rich to capture note attacks, amplitude and frequency modulation, as well as chord structure.
Download Melody Line Detection and Source Separation in classical Saxophone Recordings We propose a system which separates saxophone melodies from composite recordings of saxophone, piano, and/or orchestra. The system is intended to produce an accompaniment sans saxophone suitable for rehearsal and practice purposes. A Melody Line Detection (MLD) algorithm is proposed as the starting point for a source separation implementation which incorporates known information about typical saxophone melody lines, acoustic characteristics and range of the saxophone in order to prevent and correct detection errors. By extracting reliable information about the soloist melody line, the system separates piano or orchestra accompaniments from the solo part. The system was tested with commercial recordings and a performance of 79.7% of accurate detections was achieved. The accompaniment tracks obtained after source separation successfully remove most of the saxophone sound while preserving the original nature of the accompaniment track.
Download B-Format Acoustic Impulse Response Measurement and Analysis In the Forest at Koli National Park, Finland Acoustic impulse responses are used for convolution based auralisation and reverberation techniques for a range of applications, such as music production, sound design and virtual reality systems. These impulse responses can be measured in real world environments to provide realistic and natural sounding reverberation effects. Analysis of this data can also provide useful information about the acoustic characteristics of a particular space. Currently, impulse responses recorded in outdoor conditions are not widely available for surround sound auralisation and research purposes. This work presents results from a recent acoustic survey of measurements at three locations in the snow covered forest of Koli National Park in Finland during early spring. Acoustic impulse responses were measured using a B-format Soundfield microphone and a single loudspeaker. The results are analysed in terms of reverberation and spatial characteristics. The work is part of a larger study to collect and investigate acoustic impulse responses from a variety of outdoor locations under different climatic conditions.
Download Assisted Sound Sample Generation with Musical Conditioning in Adversarial Auto-Encoders Deep generative neural networks have thrived in the field of computer vision, enabling unprecedented intelligent image processes. Yet the results in audio remain less advanced and many applications are still to be investigated. Our project targets real-time sound synthesis from a reduced set of high-level parameters, including semantic controls that can be adapted to different sound libraries and specific tags. These generative variables should allow expressive modulations of target musical qualities and continuously mix into new styles. To this extent we train auto-encoders on an orchestral database of individual note samples, along with their intrinsic attributes: note class, timbre domain (an instrument subset) and extended playing techniques. We condition the decoder for explicit control over the rendered note attributes and use latent adversarial training for learning expressive style parameters that can ultimately be mixed. We evaluate both generative performances and correlations of the attributes with the latent representation. Our ablation study demonstrates the effectiveness of the musical conditioning. The proposed model generates individual notes as magnitude spectrograms from any probabilistic latent code samples (each latent point maps to a single note), with expressive control of orchestral timbres and playing styles. Its training data subsets can directly be visualized in the 3-dimensional latent representation. Waveform rendering can be done offline with the Griffin-Lim algorithm. In order to allow real-time interactions, we fine-tune the decoder with a pretrained magnitude spectrogram inversion network and embed the full waveform generation pipeline in a plugin. Moreover the encoder could be used to process new input samples, after manipulating their latent attribute representation, the decoder can generate sample variations as an audio effect would. Our solution remains rather light-weight and fast to train, it can directly be applied to other sound domains, including an user’s libraries with custom sound tags that could be mapped to specific generative controls. As a result, it fosters creativity and intuitive audio style experimentations. Sound examples and additional visualizations are available on Github1, as well as codes after the review process.
Download Synthesis by Mathematical Models Sound synthesis methods can be interpreted, from a mathe matical point of view, as a collection of techniques of selecting and conceptually organizing elements of a Hilbert space. In this sense, mathematics, being a highly structured and sophisticated system of classification, modeling and categorization, seems to be the natural tool to describe existing synthesis methods and to pro pose new ones. Because, from this perspective, one can think of any available (or theoretically predictable, or imaginable) synthe sis method as a collection of procedures to deal with meaningful parameters, with the term ”synthesis by mathematical models” we mean an extensive use of the modeling and categorization power of mathematics applied to the world of sounds. In this paper we give a few examples of sound synthesis tech niques, based on mathematical models. After reviewing shortly FM synthesis and synthesis by nonlinear distortion, and suggest ing some, to our advice, interesting open problems, we propose two different new methods: synthesis by means of elliptic func tions and synthesis by means of nowhere (or almostnowhere) dif ferentiable functions and lacunary series. The resulting waveforms have been produced using CSound as an audio engine, driven by Python scripts.
Download High Accuracy Frame-by-Frame Non-Stationary Sinusoidal Modelling This paper describes techniques for obtaining high accuracy estimates, including those of non-stationarity, of parameters for sinusoidal modelling using a single frame of analysis data. In this case the data used is generated from the time and frequency reassigned short-time Fourier transform (STFT). Such a system offers the potential for quasi real-time (frame-by-frame) spectral modelling of audio signals.