In this paper, we consider representing a musical signal as a dynamic texture, a model for both the timbral and rhythmical qualities of sound. We apply the new representation to the task of automatic song segmentation. In particular, we cluster sequences of audio feature-vectors, extracted from the song, using a dynamic texture mixture model (DTM). We show that the DTM model can both detect transition boundaries and accurately cluster coherent segments. The similarities between the dynamic textures which define these segments are based on both timbral and rhythmic qualities of the music, indicating that the DTM model simultaneously captures two of the important aspects required for automatic music analysis.
Luke Barrington, Antoni B. Chan, Gert R. G. Lanckr