Scattering Representation of Modulated Sounds

Joakim Andén; Stéphane Mallat
DAFx-2012 - York
Mel-frequency spectral coefficients (MFSCs), calculated by averaging the spectrogram along a mel-frequency scale, are used in many audio classification tasks. Their efficiency can be partly explained by their stability to deformation in a Euclidean norm. However, averaging the spectrogram loses high-frequency information. This loss is reduced by keeping the window size small, around 20 ms, which in turn prevents MFSCs from capturing largescale structures. Scattering coefficients recover part of this lost information using a cascade of wavelet decompositions and modulus operators, enabling larger window sizes. This representation is sufficiently rich to capture note attacks, amplitude and frequency modulation, as well as chord structure.
Download