NTCIR Project
ツール
NTCIR用フォーマット

[NTCIR Home] [NTCIR Tools Home]


NTCIR用フォーマット

使用されているタグは下記のとおりです。

Mandatory tags

<DOC>     

</DOC>

The tag for each document

<DOCNO>   

</DOCNO>

Document identifier

<LANG>

</LANG>

Language code: CH, EN, JA, KR

<HEADLINE>

</HEADLINE>

Title of this news article

<DATE>

</DATE>

Issue date

<TEXT>

</TEXT>

Text of news article

Optional tags

<AE>

</AE>

Contain figures or not

<DATELINE>

</DATELINE>

Location, date or news service of the report and tags for news editors (for NewYorkTimes) *

<DOCTYPE>

</DOCTYPE>

Categorization of documents into the four distinct types; "story","multi", "advis", "other".
(for NewYorkTimes) *

<P>

</P>

Paragraph marker

<SECTION>

</SECTION>

Section identifier in original newspapers

<WORDS>

</WORDS>

Number of words in 2 bytes (for Mainichi Newspaper)

* 詳細はこちらをご参照ください:0readme.txt for the English Gigaword Third Edition http://www.ldc.upenn.edu/Catalog/docs/LDC2007T07/0readme.txt