The Music Retrieval Demo works by calculating the "distance" between the
selected file and all other files. The other files can then be displayed
in a list ranked by their similarity, such that "closer," and hence more
similar, files are nearer the top. This demonstration uses an open-source software package called TreeQ. The TreeQ code is available at Sourceforge.
"Distances" are actually computed between templates, which are
representations of the audio files, not the audio itself. Figure 1 shows
how a template is computed. First, the waveform is Hamming-windowed into
overlapping segments; each segment is processed into a spectral
representation called Mel-frequency cepstral coefficients or
MFCCs. This is a data-reducing transformation that replaces each 20ms
window with 12 cepstral coefficients plus an energy term, yielding a
13-valued vector (one column in the illustration).
Figure 1: Template construction
The next step is to "quantize" each vector using a specially-designed
quantization tree. This recursively divides the vector space into
bins, each of which corresponds to a leaf of the tree. Any
MFCC vector will fall into one and only one bin. Given a segment of
audio, the distribution of the vectors in the various bins characterize
that audio. Counting how many vectors fall into each bin yields a
histogram template that is used in the distance measure. For this
demonstration, the distance between audio files is the simple Euclidean
distance between their corresponding templates (or rather 1 minus the
distance, so closer files have larger scores). Once scores have been computed
for each audio clip, they are sorted by magnitude to produce a ranked list
like other search engines.
Figure 2: Music similarity calculation
The reason this works, of course, is the design of the tree
quantizer. It is specially constructed using training data such that
different kinds of audio will tend to wind up in different bins. For the
demonstration, a tree with 60 bins (leaves) was constructed from the
demonstration data. All music from each artist was considered an
individual class, and the tree was automatically trained to separate
each class.
For more information, please see my paper "Content-Based
Retrieval of Music and Audio."
Back to the
TreeQ Music Retrieval Demo.