Download Voice Features For Control: A Vocalist Dependent Method For Noise Measurement And Independent Signals Computation Information about the human spoken and singing voice is conveyed through the articulations of the individual’s vocal folds and vocal tract. The signal receiver, either human or machine, works at different levels of abstraction to extract and interpret only the relevant context specific information needed. Traditionally in the field of human machine interaction, the human voice is used to drive and control events that are discrete in terms of time and value. We propose to use the voice as a source of realvalued and time-continuous control signals that can be employed to interact with any multidimensional human-controllable device in real-time. The isolation of noise sources and the independence of the control dimensions play a central role. Their dependency on individual voice represents an additional challenge. In this paper we introduce a method to compute case specific independent signals from the vocal sound, together with an individual study of features computation and selection for noise rejection.
Download Deep Learning Conditioned Modeling of Optical Compression Deep learning models applied to raw audio are rapidly gaining relevance in modeling audio analog devices. This paper investigates the use of different deep architectures for modeling audio optical compression. The models use as input and produce as output raw audio samples at audio rate, and it works with noor small-input buffers allowing a theoretical real-time and lowlatency implementation. In this study, two compressor parameters, the ratio, and threshold have been included in the modeling process aiming to condition the inference of the trained network. Deep learning architectures are compared to model an all-tube optical mono compressor including feed-forward, recurrent, and encoder-decoder models. The results of this study show that feedforward and long short-term memory architectures present limitations in modeling the triggering phase of the compressor, performing well only on the sustained phase. On the other hand, encoderdecoder models outperform other architectures in replicating the overall compression process, but they overpredict the energy of high-frequency components.
Download Fully Conditioned and Low-Latency Black-Box Modeling of Analog Compression Neural networks have been found suitable for virtual analog modeling applications. Several analog audio effects have been successfully modeled with deep learning techniques, using low-latency and conditioned architectures suitable for real-world applications. Challenges remain with effects presenting more complex responses, such as nonlinear and time-varying input-output relationships. This paper proposes a deep-learning model for the analog compression effect. The architecture we introduce is fully conditioned by the device control parameters and it works on small audio segments, allowing low-latency real-time implementations. The architecture is used to model the CL 1B analog optical compressor, showing an overall high accuracy and ability to capture the different attack and release compression profiles. The proposed architecture’ ability to model audio compression behaviors is also verified using datasets from other compressors. Limitations remain with heavy compression scenarios determined by the conditioning parameters.
Download Towards Neural Emulation of Voltage-Controlled Oscillators Machine learning models have become ubiquitous in modeling
analog audio devices. Expanding on this line of research, our study
focuses on Voltage-Controlled Oscillators of analog synthesizers.
We employ black box autoregressive artificial neural networks to
model the typical analog waveshapes, including triangle, square,
and sawtooth. The models can be conditioned on wave frequency
and type, enabling the generation of pitch envelopes and morphing across waveshapes. We conduct evaluations on both synthetic
and analog datasets to assess the accuracy of various architectural
variants. The LSTM variant performed better, although lower frequency ranges present particular challenges.
Download Neural Sample-Based Piano Synthesis Piano sound emulation has been an active topic of research and development for several decades. Although comprehensive physicsbased piano models have been proposed, sample-based piano emulation is still widely utilized for its computational efficiency and
relative accuracy despite presenting significant memory storage
requirements. This paper proposes a novel hybrid approach to
sample-based piano synthesis aimed at improving the fidelity of
sound emulation while reducing memory requirements for storing samples. A neural network-based model processes the sound
recorded from a single example of piano key at a given velocity.
The network is trained to learn the nonlinear relationship between
the various velocities at which a piano key is pressed and the corresponding sound alterations. Results show that the method achieves
high accuracy using a specific neural architecture that is computationally efficient, presenting few trainable parameters, and it requires memory only for one sample for each piano key.