Test Collections - DATA

[ PDF version ]

[User Agreement] [NTCIR home]

NTCIR Test collections : IR & QA

collection task documents Task data
genre filename lang year # of doc size topic/ question relevance judge
lang #   
NTCIR-1 IR sci. abstract ntc1-je JE 1988-1997 339,483 577MB J 83 3 grades
ntc1-j J 332,918 312MB
ntc1-e E 187,080 218MB 60
Term extraction/ role analysis ntc1-tmrc J 2,000   - -
IR news CIRB010 Ct 1998-1999 132220 132MB CtE 50 4 grades
NTCIR-2 IR sci. abstract ntc2-j J 1986-1999** 400,248 600MB JE 49 4 grades
ntc2-e E 134,978 200MB
NTCIR-3 CLIR IR news KEIB010 K 1994 66,146 74MB CtKJE 30 4 grades
CIRB011 Ct 1998-1999 132,173 870MB CtKJE 50 4 grades
CIRB020 249,508
Mainichi J 220,078
EIRB010 E 10,204
Mainichi Daily 12,723
NTCIR-3 PATENT IR patent full kkh *3 J 1998-1999 697,262 18GB CtCsKJE 31 3 grades
abstract jsh *3 1995-1999 1,706,154 1,883MB
paj *3 E 1,701,339 2,711MB
QAC : NTCIR-3 QA QA news Mainichi J 1998-1999 220,078 282MB J* 1200 exact answer
NTCIR-3 WEB IR Web (html/text) NW100G-01 multiple*4 crawled in 2001 11,038,720 100GB J* 47 4 grades
NW10G-01 1,445,466 10GB
NTCIR-4 PATENT IR patent full-text kkh *3 J 1993-2002 3,496,252 94.5GB CtCsKJE 101 4 grades
patent abstract paj *3 E 1993-2002 3,496,252 5,482MB
NTCIR-4 WEB IR Web (html/text) NW100G-01 multiple*4 crawled in 2001 11,038,720 100GB J* 47 4 grades

J:Japanese, E:English, C:Chinese (Ct:Traditional Chinese, Cs: Simplified Chinese), K:Korean;
+ indicates the document collection newly added for NTCIR-4
* English translation is available
** gakkai subfiles: 1997-1999, kaken subfiles: 1986-1997
*3: kkh : Publication of unexamined patent application, jsh: Japanese abstract, paj: English translation of jsh
*4: almost Japanese or English (some in other languages)

NTCIR Test collections : Summarization

collection task documents summaries
genre filename lang year # of doc types analysts total#
NTCIR-2 SUMM single doc news Mainichi J 1994.1995.1998 180 doc 7 3 3780
NTCIR-2 TAO 1998 1000 doc 2 1 2000
TSC:NTCIR-3 SUMM 1998-1999 60 docs 7 3 1260
multi doc 50 sets 2 3 300

-- data is available from ‚m‚h‚h
-- data is available NTCIR Workshop participants only
-- data is available from Newspaper Co.(Mainichi,Yomiuri)

[User Agreement] [NTCIR home]
Last modified : 2005-05-18