Download Voice Features For Control: A Vocalist Dependent Method For Noise Measurement And Independent Signals Computation
Information about the human spoken and singing voice is conveyed through the articulations of the individual’s vocal folds and vocal tract. The signal receiver, either human or machine, works at different levels of abstraction to extract and interpret only the relevant context specific information needed. Traditionally in the field of human machine interaction, the human voice is used to drive and control events that are discrete in terms of time and value. We propose to use the voice as a source of realvalued and time-continuous control signals that can be employed to interact with any multidimensional human-controllable device in real-time. The isolation of noise sources and the independence of the control dimensions play a central role. Their dependency on individual voice represents an additional challenge. In this paper we introduce a method to compute case specific independent signals from the vocal sound, together with an individual study of features computation and selection for noise rejection.
Download Deep Learning Conditioned Modeling of Optical Compression
Deep learning models applied to raw audio are rapidly gaining relevance in modeling audio analog devices. This paper investigates the use of different deep architectures for modeling audio optical compression. The models use as input and produce as output raw audio samples at audio rate, and it works with noor small-input buffers allowing a theoretical real-time and lowlatency implementation. In this study, two compressor parameters, the ratio, and threshold have been included in the modeling process aiming to condition the inference of the trained network. Deep learning architectures are compared to model an all-tube optical mono compressor including feed-forward, recurrent, and encoder-decoder models. The results of this study show that feedforward and long short-term memory architectures present limitations in modeling the triggering phase of the compressor, performing well only on the sustained phase. On the other hand, encoderdecoder models outperform other architectures in replicating the overall compression process, but they overpredict the energy of high-frequency components.
Download Fully Conditioned and Low-Latency Black-Box Modeling of Analog Compression
Neural networks have been found suitable for virtual analog modeling applications. Several analog audio effects have been successfully modeled with deep learning techniques, using low-latency and conditioned architectures suitable for real-world applications. Challenges remain with effects presenting more complex responses, such as nonlinear and time-varying input-output relationships. This paper proposes a deep-learning model for the analog compression effect. The architecture we introduce is fully conditioned by the device control parameters and it works on small audio segments, allowing low-latency real-time implementations. The architecture is used to model the CL 1B analog optical compressor, showing an overall high accuracy and ability to capture the different attack and release compression profiles. The proposed architecture’ ability to model audio compression behaviors is also verified using datasets from other compressors. Limitations remain with heavy compression scenarios determined by the conditioning parameters.
Download Towards Neural Emulation of Voltage-Controlled Oscillators
Machine learning models have become ubiquitous in modeling analog audio devices. Expanding on this line of research, our study focuses on Voltage-Controlled Oscillators of analog synthesizers. We employ black box autoregressive artificial neural networks to model the typical analog waveshapes, including triangle, square, and sawtooth. The models can be conditioned on wave frequency and type, enabling the generation of pitch envelopes and morphing across waveshapes. We conduct evaluations on both synthetic and analog datasets to assess the accuracy of various architectural variants. The LSTM variant performed better, although lower frequency ranges present particular challenges.
Download Neural Sample-Based Piano Synthesis
Piano sound emulation has been an active topic of research and development for several decades. Although comprehensive physicsbased piano models have been proposed, sample-based piano emulation is still widely utilized for its computational efficiency and relative accuracy despite presenting significant memory storage requirements. This paper proposes a novel hybrid approach to sample-based piano synthesis aimed at improving the fidelity of sound emulation while reducing memory requirements for storing samples. A neural network-based model processes the sound recorded from a single example of piano key at a given velocity. The network is trained to learn the nonlinear relationship between the various velocities at which a piano key is pressed and the corresponding sound alterations. Results show that the method achieves high accuracy using a specific neural architecture that is computationally efficient, presenting few trainable parameters, and it requires memory only for one sample for each piano key.