A Similarity Measure for Automatic Audio Classification

J. T. Foote. In Proc. AAAI 1997 Spring Symposium on Intelligent Integration and Use of Text, Image, Video, and Audio Corpora. Stanford, March 1997

Abstract

This paper presents recent results using statistics generated by a MMI-supervised vector quantizer as a measure of audio similarity. Such a measure has proved successful for talker identification, and the extension from speech to general audio, such as music, is straightforward. A classifier that distinguishes speech from music and non-vocal sounds is presented, as well as experimental results showing how perfect classification accuracy may be achieved on a small corpus using substantially less than two seconds per test audio file. The techniques a presented here may be extended to other applications and domains, such as audio retrieval-by-similarity, musical genre classification, and automatic segmentation of continuous audio.

Download paper (7 pages): [pdf]