Video Indexing and Understanding
Name-It Project
We have been developing Name-It, a system that associates faces and names
in news videos. The system is given news videos, which include image sequences
and transcripts obtained from audio tracks or closed caption texts. The
system can then either infer the name of a given face and output the name
candidates, or locate a face in news videos by name. To accomplish this
task, the system takes a multi-modal video analysis approach:
-
face sequence extraction/identification from videos,
-
name extraction from transcripts, and
-
video caption recognition.
Each method includes several advanced image and natural language processing
techniques: face tracking, face identification, intelligent name extraction
using dictionary, thesaurus, and parser, text region detection, image enhancement,
character recognition, and the integration of these techniques. The success
of our experiments demonstrates the benefits of the multi-modal approach
for video analysis.
Experimental Results
Given: 5 hr. CNN Headline News
Figure 1. Name-to-Face Retrieval (given "CLINTON")
-
WARREN 0.177633
-
CHRISTOPHER 0.032785
-
BEGINNING 0.0232368
-
CONGRESS 0.0220912
Figure 2. Face-to-Name Retrieval (given the face of Warren Christopher)
Publications
-
Name-It: Naming and Detecting Faces in News Videos,
Shin'ichi Satoh, Yuichi Nakamura, and Takeo Kanade,
IEEE MultiMedia, Vol. 6, No. 1, January-March, pp. 22-35, 1999.
-
Name-It: Association of Face and Name in Video
,
Shin'ichi Satoh and Takeo Kanade,
Proc. of CVPR'97,
pp. 368-373, 1997.
(longer version: School of Computer Science, Carnegie Mellon University,
CMU-CS-96-205,
December, 1996.)
-
Name-It: Naming and Detecting Faces in Video
by the Integration of Image and Natural Language Processing,
Shin'ichi Satoh, Yuichi
Nakamura and Takeo Kanade,
Proc. of IJCAI-97, pp.
1488-1493, 1997.
Related Links
Last modified: Wed Mar 17 1999.
Shin'ichi Satoh
satoh@rd.nacsis.ac.jp