Video Indexing and Understanding

Name-It Project

We have been developing Name-It, a system that associates faces and names in news videos. The system is given news videos, which include image sequences and transcripts obtained from audio tracks or closed caption texts. The system can then either infer the name of a given face and output the name candidates, or locate a face in news videos by name. To accomplish this task, the system takes a multi-modal video analysis approach: Each method includes several advanced image and natural language processing techniques: face tracking, face identification, intelligent name extraction using dictionary, thesaurus, and parser, text region detection, image enhancement, character recognition, and the integration of these techniques. The success of our experiments demonstrates the benefits of the multi-modal approach for video analysis.

Experimental Results

Given: 5 hr. CNN Headline News


Figure 1. Name-to-Face Retrieval (given "CLINTON")


  1. WARREN 0.177633
  2. CHRISTOPHER 0.032785
  3. BEGINNING 0.0232368
  4. CONGRESS 0.0220912
Figure 2. Face-to-Name Retrieval (given the face of Warren Christopher)


Related Links

Last modified: Wed Mar 17 1999.
Shin'ichi Satoh