Video Indexing and Understanding

Name-It Project

We have been developing Name-It, a system that associates faces and names in news videos. The system is given news videos, which include image sequences and transcripts obtained from audio tracks or closed caption texts. The system can then either infer the name of a given face and output the name candidates, or locate a face in news videos by name. To accomplish this task, the system takes a multi-modal video analysis approach:

face sequence extraction/identification from videos,
name extraction from transcripts, and
video caption recognition.

Each method includes several advanced image and natural language processing techniques: face tracking, face identification, intelligent name extraction using dictionary, thesaurus, and parser, text region detection, image enhancement, character recognition, and the integration of these techniques. The success of our experiments demonstrates the benefits of the multi-modal approach for video analysis.