Automatic Content-Based Retrieval of Broadcast News

Martin G. Brown, Jonathan T. Foote, Gareth J. F. Jones, K. Sparck Jones, S. J. Young

Abstract

This paper presents current work on a video retrieval project at Cambridge University and Olivetti Research Limited (ORL). We show that statistical methods developed for text retrieval are also effective for retrieving and browsing multimedia documents. These methods allow rapid retrieval of news broadcasts by information content determined from teletext subtitles. Information retrieval results for experiments performed on a large archive of news broadcasts are presented. This is made possible by the ORL Medusa system, which allows practical recording, storage, and playback of tens of gigabytes of multimedia data. This work is a step towards practical retrieval of multimedia documents, where the information content is determined from speech recognition performed on the audio soundtrack. We describe the project background, the ORL Medusa multimedia system, and retrieval application, as well as the news broadcast corpus and methods of browsing the retrieved news stories.

[html] [pdf] [ps.gz]