Download Separation of musical notes with highly overlapping partials using phase and temporal constrained complex matric factorization In note separation of polyphonic music, how to separate the overlapping partials is an important and difficult problem. Fifths and octaves, as the most challenging ones, are, however, usually seen in many cases. Non-negative matrix factorization (NMF) employs the constraints of energy and harmonic ratio to tackle this problem. Recently, complex matrix factorization (CMF) is proposed by combining the phase information in source separation problem. However, temporal magnitude modulation is still serious in the situation of fifths and octaves, when CMF is applied. In this work, we investigate the temporal smoothness model based on CMF approach. The temporal ac-tivation coefficient of a preceding note is constrained when the succeeding notes appear. Compare to the unconstraint CMF, the magnitude modulation are greatly reduced in our computer simulation. Performance indices including sourceto-interference ratio (SIR), source-to-artifacts ratio (SAR), sourceto-distortion ratio (SDR), as well as modulation error ratio (MER) are given.
Download Swing Ratio Estimation Swing is a typical long-short rhythmical pattern that is mostly present in jazz music. In this article, we propose an algorithm to automatically estimate how much a track, a frame of a track, is swinging. We denote this by swing ratio. The algorithm we propose is based on the analysis of the auto-correlation of the onset energy function of the audio signal and a simple set of rules. For the purpose of the evaluation of this algorithm, we propose and share the “GTZAN-rhythm” test-set, which is an extension of a well-known test-set by adding annotations of the whole rhythmical structure (downbeat, beat and eight-note positions). We test our algorithm for two tasks: detecting tracks with or without swing, and estimating the amount of swing. Our algorithm achieves 91% mean recall. Finally we use our annotations to study the relationship between the swing ratio and the tempo (study the common belief that swing ratio decreases linearly with the tempo) and the musicians. How much and how to swing is never written on scores, and is therefore something to be learned by the jazzstudents mostly by listening. Our algorithm could be useful for jazz student who wants to learn what is swing.
Download Prioritized Computation for Numerical Sound Propagation The finite difference time domain (FDTD) method is commonly used as a numerically accurate way of propagating sound. However, it requires extensive computation. We present a simple method for accelerating FDTD. Specifically, we modify the FDTD update loop to prioritize computation where it is needed most in order to faithfully propagate waves through the simulated space. We estimate for each potential cell update its importance to the simulation output and only update the N most important cells, where N is dependent on the time available for computation. In this paper, we explain the algorithm and discuss how it can bring enhanced accuracy and dynamism to real-time audio propagation.
Download Source-Filter based Clustering for Monaural Blind Source Separation In monaural blind audio source separation scenarios, a signal mixture is usually separated into more signals than active sources. Therefore it is necessary to group the separated signals to the final source estimations. Traditionally grouping methods are supervised and thus need a learning step on appropriate training data. In contrast, we discuss unsupervised clustering of the separated channels by Mel frequency cepstrum coefficients (MFCC). We show that replacing the decorrelation step of the MFCC by the non-negative matrix factorization improves the separation quality significantly. The algorithms have been evaluated on a large test set consisting of melodies played with different instruments, vocals, speech, and noise.
Download A Quadric Surface Model of Vacuum Tubes for Virtual Analog Applications Despite the prevalence of modern audio technology, vacuum tube amplifiers continue to play a vital role in the music industry. For this reason, over the years, many different digital techniques have been introduced for accomplishing their emulation. In this paper, we propose a novel quadric surface model for tube simulations able to overcome the Cardarilli model in terms of efficiency whilst retaining comparable accuracy when grid current is negligible. After showing the model capability to well outline tubes starting from measurement data, we perform an efficiency comparison by implementing the considered tube models as nonlinear 3-port elements in the Wave Digital domain. We do this by taking into account the typical common-cathode gain stage employed in vacuum tube guitar amplifiers. The proposed model turns out to be characterized by a speedup of 4.6× with respect to the Cardarilli model, proving thus to be promising for real-time Virtual Analog applications.
Download Discretization of Parametric Analog Circuits for Real-Time Simulations The real-time simulation of analog circuits by digital systems becomes problematic when parametric components like potentiometers are involved. In this case the coefficients defining the digital system will change and have to be adapted. One common solution is to recalculate the coefficients in real-time, a possibly computationally expensive operation. With a view to the simulation using state-space representations, two parametric subcircuits found in typical guitar amplifiers are analyzed, namely the tone stack, a linear passive network used as simple equalizer and a distorting preamplifier, limiting the signal amplitude with LEDs. Solutions using trapezoidal rule discretization are presented and discussed. It is shown, that the computational costs in case of recalculation of the coefficients are reduced compared to the related DK-method, due to minimized matrix formulations. The simulation results are compared to reference data and show good match.
Download Distortion Recovery: A Two-Stage Method for Guitar Effect Removal Removing audio effects from electric guitar recordings makes it easier for post-production and sound editing. An audio distortion recovery model not only improves the clarity of the guitar sounds but also opens up new opportunities for creative adjustments in mixing and mastering. While progress have been made in creating such models, previous efforts have largely focused on synthetic distortions that may be too simplistic to accurately capture the complexities seen in real-world recordings. In this paper, we tackle the task by using a dataset of guitar recordings rendered with commercial-grade audio effect VST plugins. Moreover, we introduce a novel two-stage methodology for audio distortion recovery. The idea is to firstly process the audio signal in the Mel-spectrogram domain in the first stage, and then use a neural vocoder to generate the pristine original guitar sound from the processed Mel-spectrogram in the second stage. We report a set of experiments demonstrating the effectiveness of our approach over existing methods, through both subjective and objective evaluation metrics.
Download Blind Arbitrary Reverb Matching Reverb provides psychoacoustic cues that convey information concerning relative locations within an acoustical space. The need
arises often in audio production to impart an acoustic context on an
audio track that resembles a reference track. One tool for making
audio tracks appear to be recorded in the same space is by applying
reverb to a dry track that is similar to the reverb in a wet one. This
paper presents a model for the task of “reverb matching,” where
we attempt to automatically add artificial reverb to a track, making
it sound like it was recorded in the same space as a reference track.
We propose a model architecture for performing reverb matching
and provide subjective experimental results suggesting that the reverb matching model can perform as well as a human. We also
provide open source software for generating training data using an
arbitrary Virtual Studio Technology plug-in.
Download Synthesis of Sound Textures with Tonal Components Using Summary Statistics and All-Pole Residual Modeling The synthesis of sound textures, such as flowing water, crackling fire, an applauding crowd, is impeded by the lack of a quantitative definition. McDermott and Simoncelli proposed a perceptual source-filter model using summary statistics to create compelling synthesis results for non-tonal sound textures. However, the proposed method does not work well with tonal components. Comparing the residuals of tonal sound textures and non-tonal sound textures, we show the importance of residual modeling. We then propose a method using auto regressive modeling to reduce the amount of data needed for resynthesis and delineate a modified method for analyzing and synthesizing both tonal and non-tonal sound textures. Through user evaluation, we find that modeling the residuals increases the realism of tonal sound textures. The results suggest that the spectral content of the residuals has an important role in sound texture synthesis, filling the gap between filtered noise and sound textures as defined by McDermott and Simoncelli. Our proposed method opens possibilities of applying sound texture analysis to musical sounds such as rapidly bowed violins.