NTCIR-4 Test Collections: Documents
The following documents collections are used for the 4th NTCIR Workshop. They are available for the participating research groups free of charge for the task participation and system evaluation within the 4th NTCIR Workshop. To obtain the data, the signed user agreement forms must be submitted to the NTCIR Project Office at the NII. Please notice that the Xinhua Collection in the NTCIR-4 CLIR test collection needs the different procedure and the separate user agreement form to obtain the data.
task | test collection | documents | ||||
genre | language | file name | number of documents (size) | year | ||
CLIR | NTCIR-4 CLIR | news articles | Chinese (traditional) | CIRB020 | 249,508 | 1998-1999 |
Japanese | Mainichi | 220,078 | ||||
Korean | Hankookilbo<*> | 149,498 | ||||
English | EIRB010 | 10,204 | ||||
Mainichi Daily News | 12,723 | |||||
Korea Times<*> | 21,377 | |||||
Hong Kong Standard<*> | ca. 60K | |||||
Xinhua<*> | 208,168 | |||||
PATENT | NTCIR-4 PATENT | patent full | Japanese | ca. 3500K | 1993-2002 | |
patent abstract | English |
Patent Abstracts of Japan (PAJ)<*> |
ca. 3500K | |||
QAC | NTCIR-4 QA | news articles | Japanese | Mainichi | 220,078 | 1998-1999 |
Yomiuri<*> | ca. 260K | |||||
TSC | NTCIR-4 SUMM | news articles | Japanese | Mainichi | 220,078 | 1998-1999 |
Yomiuri<*> | ca. 260K | |||||
WEB | NTCIR-4 WEB | Web | multiple languages <4> | NW100G-01 | 11,038,720 (100GB) |
crawled in 2001 |
1: For the details of the task data (topics and relevance judgments, questions and answers, summaries, etc), please consult the CFPs of each task.
2: New data (the addition to the NTCIR-3 test collections) is indicated by <*>.
3: Please notice that the document collections shall be used for the purpose of accomplishing tasks set out in the NTCIR Workshop 4 and for the purpose of research related to the tasks. The documents can not be used for "commercial purpose" nor "information purpose".
4: almost Japanese or English (some in other languages)