|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
NTCIR ProjectTest Collections - DATA[Japanese]NTCIR Test collections : IR & QA|CLIR on Scientific|CLIR on News|CQA|GeoTime|Opinion
|
Class | Collection | Task | Documents | Task data | |||||||||||||
Genre | Filename | Lang. |
Year | # of doc | Size: uncompressed (compressed) |
Topic/ Question | Relevance judge |
||||||||||
lang | # | ||||||||||||||||
ACLIA | In Advanced Cross-Lingual Information Access (ACLIA), Complex Cross-Lingual Question Answering Task (CCLQA) and Information Retrieval for QA (IR for QA) Task are combined. For further details, please consult the columns of 'CLIR on News' and 'QA'. |
||||||||||||||||
CCLQA | For further details about Complex Cross-Lingual Question Answering,please consult the columns of 'QA'. | ||||||||||||||||
CLIR on Scientific | NTCIR-1 | IR | sci. abstract | ntc1-je (A) | JE | 1988- 1997 |
339,483 | 577MB | J | 83 | 3 grades |
||||||
ntc1-j (A) | J | 332,918 | 312MB | ||||||||||||||
ntc1-e (A) | E | 187,080 | 218MB | 60 | |||||||||||||
TE *5 | ntc1-tmrc (A) | J | 2,000 | - | - | - | |||||||||||
NTCIR-2 | IR | sci. abstract | ntc2-j (A) | J | 1986- 1999 *2 |
400,248 | 600MB | E J |
49 | 4 grades |
|||||||
ntc2-e (A) | E | 134,978 | 200MB | ||||||||||||||
CLIR on News |
CIRB010 | IR | News | CIRB010 (C) | Ct | 1998- 1999 |
132,220 | 132MB | Ct E |
50 *11 |
4 grades |
||||||
NTCIR-3 CLIR | IR | News | KEIB010(C) | K | 1994 | 66,146 | 74MB | Ct E J K |
30 *11 |
4 grades |
|||||||
CIRB011(C) | Ct | 1998- 1999 |
132,173 | 870MB | - | Ct E J K |
50 *11 |
4 grades |
|||||||||
CIRB020(A) | 249,508 | (246MB) | |||||||||||||||
EIRB010(C) | E | 10,204 | - | ||||||||||||||
Mainichi Daily(A) | 12,723 | (12.9MB) | |||||||||||||||
Mainichi(B) | J | 220,078 | - | ||||||||||||||
NTCIR-4 CLIR | IR | News | CIRB011(C) | Ct | 1998- 1999 |
132,173 | ca.3GB | - | Ct E J K |
60 *11 |
4 grades |
||||||
CIRB020(A) | 249,203 | (246MB) | |||||||||||||||
EIRB010(C) | E | 10,204 | - | ||||||||||||||
Mainichi Daily(A) | 12,723 | (12.9 MB) | |||||||||||||||
Korea Times(A) | 19,599 | (20.4 MB) | |||||||||||||||
Hong Kong Standard(A) | 96,683 | - | |||||||||||||||
Xinhua(B) | 208,167 | - | |||||||||||||||
Mainichi(B) | J | 220,078 | - | ||||||||||||||
Yomiuri(B) | 373,558 | - | |||||||||||||||
Hankookilbo(A) | K | 149,921 | (93.5 MB) | ||||||||||||||
Chosenilbo(A) | 104,517 | (75.4 MB) | |||||||||||||||
NTCIR-5 CLIR | IR | News | CIRB040r(A) | Ct | 2000- 2001 |
901,446 | 582 MB (581.7MB) |
Ct E J K |
50 *11 |
4 grades |
|||||||
Mainichi Daily(A) | E | 12,155 | 9.9MB (9.9MB) |
||||||||||||||
Korea Times(A) | 30,530 | 25.3MB (25.3MB) |
|||||||||||||||
Daily Yomiuri(B) | 17,741 | 22.9MB | |||||||||||||||
Xinhua(B) | 198,624 | - | |||||||||||||||
Mainichi(B) | J | 199,681 | 118.8MB | ||||||||||||||
Yomiuri(B) | 658,719 | 343.3MB | |||||||||||||||
Hankookilbo(A) | K | 85,250 | 52.1MB (52.1MB) |
||||||||||||||
Chosenilbo(A) | 135,124 | 88.7MB (88.7MB) |
|||||||||||||||
NTCIR-6 CLIR | IR | News | CIRB040r(A) | Ct | 2000- 2001 |
901,446 | 582 MB (581.7MB) |
Ct E J K |
50 (selected from NTCIR-3,4) *11 |
4 grades |
|||||||
Mainichi(B) | J | 199,681 | 118.8MB | ||||||||||||||
Yomiuri(B) | 658,719 | 343.3MB | |||||||||||||||
Hankookilbo(A) | K | 85,250 | 52.1MB (52.1MB) |
||||||||||||||
Chosenilbo(A) | 135,124 | 88.7MB (88.7MB) |
|||||||||||||||
NTCIR-7 ACLIA (IR for QA) |
IR | News | Lianhe Zaobao (A) | Cs | 1998- 2001 |
249,287 | 411 MB (229.8MB) |
C E J |
CS-CS: 97 CT-CT: 95 EN-CS: 97 EN-CT: 95 EN-JA: 98 JA-JA: 98 |
3 grades |
|||||||
Xinhua Chinese(B) | 295,875 | 511 MB | |||||||||||||||
CIRB020(A) | Ct | 1998- 1999 |
249,508 | 320 MB (246MB) |
|||||||||||||
CIRB040r(A) | 2000- 2001 |
901,446 | 582 MB (581.7MB) |
||||||||||||||
Mainichi(B) | J | 1998- 2001 |
419,759 | 544 MB | |||||||||||||
NTCIR-8 ACLIA (IR for QA) |
IR | News | Xinhua Chinese (B) | Cs | 2002- 2005 |
308,845 | 516MB (210MB) |
C E J |
100 for each language pair *11 |
3 grades |
|||||||
UDN (A) | Ct | 1,663,517 | 1999MB (1035MB) |
||||||||||||||
Mainichi (B) | J | 377,941 | 678MB (244MB) |
||||||||||||||
CLQA | For further details about Cross-Lingual Question Answering, please consult the columns of 'QA'. | ||||||||||||||||
CQA | NTCIR-8 CQA | answer quality ranking | QA site on Web | Yahoo!Q&A corpus (Chiebukuro) (A) |
J | Apr. 2004 to Oct. 2005 |
Questions resolved: 3,116,009 | ca. 916MB | J | Questions: 1500 | 2 graded or 4 graded |
||||||
Best answers: 3,116,008 | ca. 935MB | Answers: 7443 | Best answers: 1500 |
||||||||||||||
Other answers: 10,361,777 | ca. 2.3GB | Normal answers: 5943 | |||||||||||||||
GeoTime | NTCIR-8 GeoTime |
IE/ analysis |
News | New York Times (B) | E | 2002- 2005 |
315,417 | 1570MB | J E |
25 | - | ||||||
Mainichi (B) | J | 377,941 | 678MB (244MB) |
- | |||||||||||||
IR4QA | For further details about Information Retrieval for QA, please consult the columns of 'CLIR on News'. | ||||||||||||||||
MOAT | For further details about Multilingual Opinion Analysis, please consult the columns of 'OPINION'. | ||||||||||||||||
MuST (Trend Inform ation) |
NTCIR-6 MuST |
IE/ analysis |
News | Mainichi (B) | J | 1998- 1999 |
220,078 | 260MB | J | 27 | 581 *9 |
- | |||||
NTCIR-7 MuST |
IE/ analysis |
News | Mainichi (B) | J | 1998- 2001 |
419,759 | 535MB | J | 25 (8topics) |
701 *9 |
- | ||||||
OPINION | NTCIR-6 OPINION | IE/ analysis |
News | CIRB020(A) | Ct | 1998- 1999 |
249,508 | 788MB | (246MB) | Ct E J |
32 (selected from NTCIR -3,-4,-5 CLIR) |
843 *8 |
2 types, 3 metrics |
||||
CIRB040r(A) | 2000- 2001 |
901,446 | (581.7MB) | ||||||||||||||
Daily Yomiuri(B) | E | 2000- 2001 |
17,741 | 471.5MB | - | 439 *8 |
|||||||||||
Mainichi Daily(A) | 1998- 2001 |
24,878 | (22.8MB) | ||||||||||||||
Korea Times(A) | 2000- 2001 |
30,530 | (45.7MB) | ||||||||||||||
Hong Kong Standard(A) | 1998- 1999 |
96,856 | - | ||||||||||||||
Xinhua(B) | 1998- 2001 |
406,791 | 299MB | ||||||||||||||
Mainichi(B) | J | 1998- 2001 |
419,759 | 766MB | 490 *8 |
||||||||||||
Yomiuri(B) | 1,034,699 | ||||||||||||||||
NTCIR-7 MOAT |
IE/ analysis |
News | Xinhua Chinese(B) | Cs | 1998- 2001 |
295,875 | 511 MB | Cs | 16 | 271 *10 |
2 types, 3 metrics |
||||||
Lianhe Zaobao(A) | 249,287 | 230MB (229.8MB) |
|||||||||||||||
CIRB020(A) | Ct | 1998- 1999 |
249,508 | 320 MB (246MB) |
Ct | 17 | 246 *10 |
||||||||||
CIRB040r(A) | 2000- 2001 |
901,446 | 582 MB (581.7MB) |
||||||||||||||
Mainichi Daily(A) | E | 1998- 2001 |
24,878 | 22.8MB (22.8MB) |
E | 17 | 167 *10 |
||||||||||
Korea Times(A) | 50,129 | 45.7MB (45.7MB) |
|||||||||||||||
Hong Kong Standard(A) | 1998- 1999 |
96,683 | 252MB | ||||||||||||||
Xinhua(B) | 1998- 2001 |
406,791 | 229MB | ||||||||||||||
Straits Times(A) | - | 250MB (249.8MB) |
|||||||||||||||
Mainichi(B) | J | 419,759 | 544 MB | J | 22 | 287 *12 |
|||||||||||
NTCIR-8 MOAT |
IE/ analysis |
News | Xinhua Chinese (B) | Cs | 2002- 2005 |
308,845 | 516MB (210MB) |
Cs | 19 | 385 *12 |
2 types, 3 metrics |
||||||
UDN (A) | Ct | 1,663,517 | 1999MB (1035MB) |
Ct | 20 | 775 *12 |
|||||||||||
New York Times (B) | E | 315,417 | 1570MB | E | 20 | 138 *12 |
|||||||||||
Mainichi(B) | J | 377,941 | 678MB (244MB) |
J | 20 | 170 *12 |
|||||||||||
Patent | NTCIR-3 PATENT | IR | patent full | kkh (A) *3 | J | 1998- 1999 |
697,262 | 18GB | Ct Cs K J E |
31 | 3 grades |
||||||
abstract | jsh (A) *3 | 1995- 1999 |
1,706,154 | 1,883MB | |||||||||||||
paj (A)*3 | E | 1,701,339 | 2,711MB | ||||||||||||||
NTCIR-4 PATENT | IR | patent full | Publication of unexamined patent application (A) | J | 1993- 1997 |
ca. 1,700,000 |
ca.45GB | E | Main:34, Add:69 |
3 grades |
|||||||
abstract | Patent Abstracts of Japan(PAJ) (A) | E | 1993- 1997 |
ca. 1,700,000 |
ca.2.2GB | ||||||||||||
NTCIR-5 PATENT | IR/ classi fication |
patent full | Publication of unexamined patent application (A) | J | 1993- 2002 |
3,496,252 | 94.5GB | J E |
34+1189 in NRCIR-5, added 349+1681 in NTCIR-6 |
3 grades |
|||||||
abstract | Patent Abstracts of Japan(PAJ) (A) | E | 1993- 2002 |
3,496,252 | ca.5GB | ||||||||||||
NTCIR-6 PATENT | IR/ classi fication |
patent full | Patent grant data published by USPTO (A) | E | 1993- 2002 |
1,315,470 |
52.6GB | E | 3221 | 3 grades |
|||||||
patent full | Publication of unexamined patent application (A) | J | 1993- 2002 |
3,496,252 | 94.5GB | J |
Japanese Retrieval Classification |
4 grades |
|||||||||
abstract | Patent Abstracts of Japan(PAJ) (A) | E | 1993- 2002 |
3,496,252 | ca.5GB | E | 1 grade |
||||||||||
Patent Mining | NTCIR-7 PATMN |
Mining | patent full | Patent grant data published byUSPTO (A) | E | 1993- 2002 |
1,315,470 | 52.6GB |
E J |
English/Cross-lingual (J2E): 976 | 2 grades |
||||||
patent full | Publication of unexamined patent application (A) | J | 1993- 2002 |
3,496,252 | 94.5GB | ||||||||||||
abstract | Patent Abstracts of Japan(PAJ) (A) | E | 1993- 2002 |
3,496,252 | ca.5GB | ||||||||||||
sci. abstract | ntc1-je (A) | JE | 1988- 1997 |
339,483 | 577MB | Japanese/Cross-lingual (E2J): 976 | |||||||||||
ntc1-j (A) | J | 332,918 | 312MB | ||||||||||||||
ntc1-e (A) | E | 187,080 | 218MB | ||||||||||||||
ntc2-j (A) | J | 1986- 1999 *2 |
400,248 | 600MB | |||||||||||||
ntc2-e (A) | E | 134,978 | 200MB | ||||||||||||||
NTCIR-8 PATMN |
Mining | patent full | Patent grant data published byUSPTO (A) | E | 1993- 2002 |
1,315,470 | 52.6GB | J E |
Subtask of Research Paper Classification: E:624 Cross-lingual (J2E): 644 J:644 Cross-lingual(E2J):624 |
1 | |||||||
Publication of unexamined patent application (A) | J | 1993- 2002 |
3,496,252 | 94.5GB | |||||||||||||
abstract | Patent Abstracts of Japan(PAJ) (A) | E | 1993- 2002 |
3,496,252 | ca.5GB | ||||||||||||
sci. abstract | ntc1-je (A) | JE | 1988- 1997 |
339,483 | 577MB | J E |
Subtask of technical trend map creation: |
1 | |||||||||
ntc1-j (A) | J | 332,918 | 312MB | ||||||||||||||
ntc1-e (A) | E | 187,080 | 218MB | ||||||||||||||
ntc2-j (A) | J | 1986- 1999 *2 |
400,248 | 600MB | |||||||||||||
ntc2-e (A) | E | 134,978 | 200MB | ||||||||||||||
QA |
NTCIR-3 QA | QA | News | Mainichi (B) | J | 1998- 1999 |
220,078 | 260MB | J *1 | 1200 | exact answer | ||||||
NTCIR-4 QA | QA | News | Mainichi (B) | J | 1998- 1999 |
220,078 | ca. 776MB |
J *1 | 197 | exact answer | |||||||
199 | |||||||||||||||||
Yomiuri (B) | 373,558 | 251 | |||||||||||||||
NTCIR-5 CLQA | QA | News | CIRB040r(A) | C | 2000- 2001 |
901,446 | 581.7MB (581.7MB) |
C E J |
smpl:300, test:200*6 | 3 grades *7 |
|||||||
Daily Yomiuri(B) | E | 17,741 | 22.9MB | ||||||||||||||
Yomiuri(B) | J | 658,719 | 343.3MB | ||||||||||||||
NTCIR-5 QA | QA | News | Mainichi (B) | J | 2000- 2001 |
199,681 | 260MB | J *1 | 50 series (360Q) |
graded | |||||||
NTCIR-6 CLQA |
QA | News | CIRB020(A) | Ct | 1998- 1999 |
249,203 | 320MB (246MB) |
C E J |
C-E/C-C/E-C/E-E: 150 J-E/J-J/E-J: 200, |
3 grades *7 |
|||||||
EIRB010(C) | E | 10,204 | 24.5MB | ||||||||||||||
Mainichi Daily(A) | 12,723 | 33.3MB (12.9MB) |
|||||||||||||||
Korea Times(A) | 19,599 | 55.8MB (20.4MB) |
|||||||||||||||
Hong Kong Standard(A) | 96,683 | 252MB | |||||||||||||||
Mainichi(B) | J | 220,078 | 282MB | ||||||||||||||
NTCIR-6 QA | QA | News | Mainichi (B) | J | 1998- 2001 |
419,759 | 535MB | J | 100Q (any kind of Q) |
graded (3 types, 4 levels) |
|||||||
NTCIR-7 ACLIA (CCLQA) |
QA | News | Lianhe Zaobao (A) | Cs | 1998- 2001 |
249,287 | 411 MB (229.8MB) |
C J E |
CS-CS: 100 CT-CT: 100 EN-CS: 100 EN-CT: 100 EN-JA: 100 JA-JA: 100 |
Binary decision (system response conceptually containing the nugget or not) |
|||||||
Xinhua Chinese(B) | 295,875 | 511 MB | |||||||||||||||
CIRB020(A) | Ct | 1998- 1999 |
249,508 | 320 MB (246MB) |
|||||||||||||
CIRB040r(A) | 2000- 2001 |
901,446 | 582 MB (581.7MB) |
||||||||||||||
Mainichi(B) | J | 1998- 2001 |
419,759 | 544 MB | |||||||||||||
NTCIR-8 ACLIA (CCLQA) |
QA | News | Xinhua Chinese (B) | Cs | 2002- 2005 |
308,845 | 516MB (210MB) |
C J E |
100 for each language pair | Binary pyramid nugget matching | |||||||
UDN (A) | Ct | 1,663,517 | 1999MB (1035MB) |
||||||||||||||
Mainichi (B) | J | 377,941 | 678MB (244MB) |
||||||||||||||
WEB | NTCIR-3 WEB | IR | Web (html/ text) |
NW100G-01 (A) | m*4 | crawled in 2001 |
11,038,720 | 100GB | J *1 | 47 | 4 grades + relative |
||||||
NW10G-01 (A) | 1,445,466 | 10GB | |||||||||||||||
NTCIR-4 WEB | IR | Web (html/ text) |
NW100G-01 (A) | m*4 | crawled in 2001 |
11,038,720 | 100GB | J *1 | - | 3 grades |
|||||||
NTCIR-5 WEB | IR | Web (html/ text) |
NW1000G-04 (A) | m*4 | crawled in 2004 |
98,870,352 | 1.36TB | J *1 | 269+847 | 3 grades |
Collection | Task | Documents | Task data | ||||||||||||
Genre | Filename | Lang. |
Year | # of doc | Size | Test Data | Training Data | Rele- vance judge |
|||||||
lang | # | lang | # | ||||||||||||
NTCIR-7 PATMT |
MT | patent full | Patent grant data published byUSPTO (A) | E | 1993- 2002 |
1,315,470 | 52.6GB | E | Intrinsic 1381 sents. *1 |
J E |
1,798,571 sent pairs |
- | |||
J | Intrinsic 1381 sents *2 |
- | |||||||||||||
Publication of unexamined patent application(A) | J | 1993- 2002 |
3,496,252 | 94.5GB | |||||||||||
E | Extrinsic 124 claims | 3 levels |
|||||||||||||
NTCIR-8 PATMT |
MT: Translation Subtask |
patent full | Publication of unexamined patent application (A) | J | 1993- 2007 |
5,253,613 | 165.0GB | E | Intrinsic 1119 sents. *3 |
J E |
3,186,284 sent pairs |
- | |||
J | Intrinsic 1251 sents. *4 |
- | |||||||||||||
Patent grant data published by USPTO (A) | E | 1993- 2007 |
2,124,370 | 120.6GB | |||||||||||
E | Extrinsic 91 claims | 3 level |
|||||||||||||
AE (Evaluation Subtask) |
- | - | - | - | - | - | J E |
Source Data (J): 100 sents. Reference Translation Data (E): 100 sents. Machine Translation Data (E): 100sents. * 12 systems Human Evaluation Data (adequacy): 100sents. * 12 systems * 3 raters Human Evaluation Data (fluency): 100 sents. * 12 systems * 3 raters *5 |
J E |
Source Data (J): 100sents Reference Translation Data (E) :100 sents Machine Translation Data (E): 100 sents * 11 systems Human Evaluation Data (adequacy): 100 sents * 11systems * 3 raters Human Evaluation Data (fluency): 100sents * 11 systems * 3 raters |
- |
collection | task | documents | summaries | ||||||
genre | filename | lang | year | # of doc | types | analysts | total# | ||
NTCIR-2 SUMM | single doc | news | Mainichi(B) | J | 1994.1995.1998 | 180 doc | 7 | 3 | 3780 |
NTCIR-2 TAO*1 | Mainichi(B) | 1998 | 1000 doc | 2 | 1 | 2000 | |||
NTCIR-3 SUMM | Mainichi(B) | 1998-1999 | 60 docs | 7 | 3 | 1260 | |||
multi doc | 50 sets | 2 | 3 | 300 |
*17: Distribution of NTCIR-2 SUMM TAO (Text Summarization) is currently unavailable. We will announce through the ntcir Mailing list once it becomes
available again.
(A) | the document collections available from NII for research purpose | |||||
(B) | the document collections available for task participants for free, and available for research purpose use other than NTCIR participation from other party with fee |
|||||
(C) | the document collections available for task participants only |