NTCIR Project
NTCIR standard document format

[NTCIR Home] [NTCIR Tools Home]

NTCIR standard document format

The tag set is shown as follows.

Mandatory tags



The tag for each document



Document identifier



Language code: CH, EN, JA, KR



Title of this news article



Issue date



Text of news article

Optional tags



Contain figures or not



Location, date or news service of the report and tags for news editors (for NewYorkTimes) *



Categorization of documents into the four distinct types; "story","multi", "advis", "other".
(for NewYorkTimes) *



Paragraph marker



Section identifier in original newspapers



Number of words in 2 bytes (for Mainichi Newspaper)

* For more details: 0readme.txt for the English Gigaword Third Edition http://www.ldc.upenn.edu/Catalog/docs/LDC2007T07/0readme.txt