TreeQ Music Retrieval Demo -- How it Works

The Music Retrieval Demo works by calculating the "distance" between the selected file and all other files. The other files can then be displayed in a list ranked by their similarity, such that "closer," and hence more similar, files are nearer the top. This demonstration uses an open-source software package called TreeQ. The TreeQ code is available at Sourceforge.

"Distances" are actually computed between templates, which are representations of the audio files, not the audio itself. Figure 1 shows how a template is computed. First, the waveform is Hamming-windowed into overlapping segments; each segment is processed into a spectral representation called Mel-frequency cepstral coefficients or MFCCs. This is a data-reducing transformation that replaces each 20ms window with 12 cepstral coefficients plus an energy term, yielding a 13-valued vector (one column in the illustration).

Histogram template construction.
Figure 1: Template construction

The next step is to "quantize" each vector using a specially-designed quantization tree. This recursively divides the vector space into bins, each of which corresponds to a leaf of the tree. Any MFCC vector will fall into one and only one bin. Given a segment of audio, the distribution of the vectors in the various bins characterize that audio. Counting how many vectors fall into each bin yields a histogram template that is used in the distance measure. For this demonstration, the distance between audio files is the simple Euclidean distance between their corresponding templates (or rather 1 minus the distance, so closer files have larger scores). Once scores have been computed for each audio clip, they are sorted by magnitude to produce a ranked list like other search engines.

Histogram distance calculation.
Figure 2: Music similarity calculation

The reason this works, of course, is the design of the tree quantizer. It is specially constructed using training data such that different kinds of audio will tend to wind up in different bins. For the demonstration, a tree with 60 bins (leaves) was constructed from the demonstration data. All music from each artist was considered an individual class, and the tree was automatically trained to separate each class.

For more information, please see my paper "Content-Based Retrieval of Music and Audio."

Back to the TreeQ Music Retrieval Demo.