[NTCIR Home] [NTCIR Tools Home]
The tag set is shown as follows.
* For more details: The English Gigaword Third Edition https://catalog.ldc.upenn.edu/LDC2007T07
Mandatory tags
<DOC>
</DOC>
The tag for each document
<DOCNO>
</DOCNO>
Document identifier
<LANG>
</LANG>
Language code: CH, EN, JA, KR
<HEADLINE>
</HEADLINE>
Title of this news article
<DATE>
</DATE>
Issue date
<TEXT>
</TEXT>
Text of news article
Optional tags
<AE>
</AE>
Contain figures or not
<DATELINE>
</DATELINE>
Location, date or news service of the report and tags for news editors (for NewYorkTimes) *
<DOCTYPE>
</DOCTYPE>
Categorization of documents into the four distinct types; "story","multi", "advis", "other".
(for NewYorkTimes) *<P>
</P>
Paragraph marker
<SECTION>
</SECTION>
Section identifier in original newspapers
<WORDS>
</WORDS>
Number of words in 2 bytes (for Mainichi Newspaper)