NTCIR Project
|
Class | Collection | Task | Documents | Task data | ||||||||||||
Genre | Filename | Lang. |
Year | # of doc | Size | Topic/ Question | Relevance judge |
|||||||||
lang | # | |||||||||||||||
ACLIA | NTCIR-7 ACLIA (CCLQA/ IR for QA) |
In Advanced Cross-Lingual Information Access (ACLIA), Complex Cross-Lingual Question Answering Task (CCLQA) and Information Retrieval for QA (IR for QA) Task are combined. For further details, please consult the columns of 'CLIR on News' and 'QA'. |
||||||||||||||
NTCIR-8 ACLIA (CCLQA/ IR for QA) |
||||||||||||||||
CLIR on Scientific | NTCIR-1 | IR | sci. abstract | ntc1-je (A) | JE | 1988- 1997 |
339,483 | 577MB | J | 83 | 3 grades |
|||||
ntc1-j (A) | J | 332,918 | 312MB | |||||||||||||
ntc1-e (A) | E | 187,080 | 218MB | 60 | ||||||||||||
TE *5 | ntc1-tmrc (A) | J | 2,000 | - | - | - | ||||||||||
NTCIR-2 | IR | sci. abstract | ntc2-j (A) | J | 1986- 1999 *2 |
400,248 | 600MB | J E |
49 | 4 grades |
||||||
ntc2-e (A) | E | 134,978 | 200MB | |||||||||||||
CLIR on News |
CIRB010 | IR | News | CIRB010 (C) | Ct | 1998- 1999 |
132,220 | 132MB | Ct E |
50 | 4 grades |
|||||
NTCIR-3 CLIR | IR | News | KEIB010 (C) | K | 1994 | 66,146 | 74MB | Ct K J E |
30 | 4 grades |
||||||
CIRB011 (C) | Ct | 1998- 1999 |
132,173 | 870MB | Ct K J |
50 | 4 grades |
|||||||||
CIRB020 (A) | 249,508 | |||||||||||||||
Mainichi (B) | J | 220,078 | ||||||||||||||
EIRB010 (C) | E | 10,204 | ||||||||||||||
Mainichi Daily (A) | 12,723 | |||||||||||||||
NTCIR-4 CLIR | IR | News | CIRB011 (C) | Ct | 1998- 1999 |
132,173 | ca.3GB | Ct K J E |
60 | 4 grades |
||||||
CIRB020 (A) | 249,203 | |||||||||||||||
Hankookilbo (A) | K | 149,921 | ||||||||||||||
Chosenilbo (A) | 104,517 | |||||||||||||||
Mainichi (B) | J | 220,078 | ||||||||||||||
Yomiuri (B) | 373,558 | |||||||||||||||
EIRB010 (C) | E | 10,204 | ||||||||||||||
Mainichi Daily (A) | 12,723 | |||||||||||||||
Korea Times (A) | 19,599 | |||||||||||||||
Hong Kong Standard (A) | 96,683 | |||||||||||||||
Xinhua (B) | 208,167 | |||||||||||||||
NTCIR-5 CLIR | IR | News | CIRB040r (A) | Ct | 2000- 2001 |
901,446 | 581.7MB | Ct K J E |
50 | 4 grades |
||||||
Hankookilbo (A) | K | 85,250 | 52.1MB | |||||||||||||
Chosenilbo (A) | 135,124 | 88.7MB | ||||||||||||||
Mainichi (B) | J | 199,681 | 118.8MB | |||||||||||||
Yomiuri (B) | 658,719 | 343.3MB | ||||||||||||||
Mainichi Daily (A) | E | 12,155 | 9.9MB | |||||||||||||
Korea Times (A) | 30,530 | 25.3MB | ||||||||||||||
Daily Yomiuri (B) | 17,741 | 22.9MB | ||||||||||||||
Xinhua (B) | 198,624 | - | ||||||||||||||
NTCIR-6 CLIR | IR | News | CIRB040r (A) | Ct | 2000- 2001 |
901,446 | 581.7MB | Ct K J E |
50 (selected from NTCIR-3,4) |
4 grades |
||||||
Hankookilbo (A) | K | 85,250 | 52.1MB | |||||||||||||
Chosenilbo (A) | 135,124 | 88.7MB | ||||||||||||||
Mainichi (B) | J | 199,681 | 118.8MB | |||||||||||||
Yomiuri (B) | 658,719 | 343.3MB | ||||||||||||||
NTCIR-7 ACLIA (IR for QA) |
IR | News | CIRB020 (A) | Ct | 1998- 1999 |
249,508 | 320 MB | C J E |
EN-JA: 98 JA-JA: 98 EN-CS: 97 CS-CS: 97 EN-CT: 95 CT-CT: 95 |
3 grades |
||||||
CIRB040r (A) | 2000- 2001 |
901,446 | 582 MB | |||||||||||||
Lianhe Zaobao (A) | Cs | 1998- 2001 |
249,287 | 411 MB | ||||||||||||
Xinhua Chinese (B) | 295,875 | 511 MB | ||||||||||||||
Mainichi (B) | J | 419,759 | 544 MB | |||||||||||||
NTCIR-8 ACLIA (IR for QA) |
IR | News | Xinhua Chinese (B) | Cs | 2002- 2005 |
308,845 | - | C J E |
100* for each language pair (* Removed a few IR4QA topics from the formal run such that a very small number of relevant document has been returned) |
3 grades |
||||||
UDN (A) | Ct | 1,663,517 | - | |||||||||||||
Mainichi (B) | J | 377,941 | - | |||||||||||||
CLQA | NTCIR-5 CLQA | For further details about Cross-Lingual Question Answering, please consult the columns of 'QA'. | ||||||||||||||
NTCIR-6 CLQA | ||||||||||||||||
CQA | NTCIR-8 CQA | QA | QA site on Web | Yahoo!Q&A corpus (Chiebukuro) (A) |
J | Apr. 2004 to Oct. 2005 |
- | - | - | - | - | - | ||||
GeoTime | NTCIR-8 GeoTime |
IR | News | New York Times (B) | E | 2002- 2005 |
315,417 | - | J E |
25 | - | |||||
Mainichi (B) | J | 377,941 | - | - | ||||||||||||
OPINION | NTCIR-6 OPINION | IE/ analysis |
News | CIRB020 (A) | Ct | 1998- 1999 |
249,508 | 788MB | Ct J E |
32 (selected from NTCIR -3,-4,-5 CLIR) |
843 *8 |
2 types, 3 metrics |
||||
CIRB040r (A) | 2000- 2001 |
901,446 | ||||||||||||||
Mainichi (B) | J | 1998- 2001 |
419,759 | 766MB | 490 *8 |
|||||||||||
Yomiuri (B) | 1998- 2001 |
1,034,699 | ||||||||||||||
Daily Yomiuri (B) | E | 2000- 2001 |
17,741 | 471.5MB | 439 *8 |
|||||||||||
Mainichi Daily (A) | 1998- 2001 |
24,878 | ||||||||||||||
Korea Times (A) | 2000- 2001 |
30,530 | ||||||||||||||
Hong Kong Standard (A) | 1998- 1999 |
96,856 | ||||||||||||||
Xinhua (B) | 1998- 2001 |
409,792 | 299MB | |||||||||||||
NTCIR-7 MOAT |
IE/ analysis |
News | CIRB020 (A) | Ct | 1998- 1999 |
249,508 | 320 MB | Ct | 17 | 246 *10 |
2 types, 3 metrics |
|||||
CIRB040r (A) | 2000- 2001 |
901,446 | 581.7MB | |||||||||||||
Xinhua Chinese (B) | Cs | 1998- 2001 |
295,875 | 511 MB | Cs | 16 | 271 *10 |
|||||||||
Lianhe Zaobao (A) | 249,287 | 230MB | ||||||||||||||
Mainichi (B) | J | 419,759 | 544 MB | J | 22 | 287 *10 |
||||||||||
Mainichi Daily (A) | E | 24,878 | 22.8MB | E | 17 | 167 *10 |
||||||||||
Korea Times (A) | 50,129 | 45.7MB | ||||||||||||||
Hong Kong Standard (A) | 1998- 1999 |
96,683 | 252MB | |||||||||||||
Xinhua (B) | 1998- 2001 |
406,791 | 229MB | |||||||||||||
Straits Times (A) | - | 250MB | ||||||||||||||
NTCIR-8 MOAT |
IE/ analysis |
News | Xinhua Chinese (B) | Cs | 2002- 2005 |
308,845 | - | Cs | - | - | - | |||||
UDN (A) | Ct | 1,663,517 | - | Ct | - | - | - | |||||||||
New York Times (B) | E | 315,417 | - | E | - | - | - | |||||||||
Mainichi(B) | J | 377,941 | - | J | - | - | - | |||||||||
Patent | NTCIR-3 PATENT | IR | patent full | kkh (A) *3 | J | 1998- 1999 |
697,262 | 18GB | Ct Cs K J E |
31 | 3 grades |
|||||
abstract | jsh (A) *3 | 1995- 1999 |
1,706,154 | 1,883MB | ||||||||||||
paj (A)*3 | E | 1,701,339 | 2,711MB | |||||||||||||
NTCIR-4 PATENT | IR | patent full | Publication of unexamined patent application (A) | J | 1993- 1997 |
ca. 1,700,000 |
ca.45GB | E | Main:34, Add:69 |
3 grades |
||||||
abstract | Patent Abstracts of Japan(PAJ) (A) | E | 1993- 1997 |
ca. 1,700,000 |
ca.2.2GB | |||||||||||
NTCIR-5 PATENT | IR/ classi fication |
patent full | Publication of unexamined patent application (A) | J | 1993- 2002 |
3,496,252 | 94.5GB | J E |
34+1189 in NRCIR-5, added 349+1681 in NTCIR-6 |
3 grades |
||||||
abstract | Patent Abstracts of Japan(PAJ) (A) | E | 1993- 2002 |
3,496,252 | ca.5GB | |||||||||||
NTCIR-6 PATENT | IR/ classi fication |
patent full | Patent grant data published from USPTO (A) | E | 1993- 2002 |
1,315,470 |
52.6GB | E | 3221 | 3 grades |
||||||
patent full | Publication of unexamined patent application (A) | J | 1993- 2002 |
3,496,252 | 94.5GB | J |
Japanese Retrieval Classification |
4 grades |
||||||||
abstract | Patent Abstracts of Japan(PAJ) (A) | E | 1993- 2002 |
3,496,252 | ca.5GB | E | 1 grade |
|||||||||
Patent Mining | NTCIR-7 PATMN |
Mining | patent full | Publication of unexamined patent application (A) | J | 1993- 2002 |
3,496,252 | 94.5GB | J E |
Japanese/ Cross-lingual (E2J) 976 |
2 | |||||
abstract | Patent Abstracts of Japan(PAJ) (A) | E | 1993- 2002 |
3,496,252 | ca.5GB | |||||||||||
patent full | Patent grant data published from USPTO (A) | E | 1993- 2002 |
1,315,470 | 52.6GB | |||||||||||
sci. abstract | ntc1-je (A) | JE | 1988- 1997 |
339,483 | 577MB | English/ Cross-lingual (J2E) 976 |
2 | |||||||||
ntc1-j (A) | J | 332,918 | 312MB | |||||||||||||
ntc1-e (A) | E | 187,080 | 218MB | |||||||||||||
ntc2-j (A) | J | 1986- 1999 *2 |
400,248 | 600MB | ||||||||||||
ntc2-e (A) | E | 134,978 | 200MB | |||||||||||||
NTCIR-8 PATMN |
Mining | patent full | Publication of unexamined patent application (A) | J | 1993- 2002 |
3,496,252 | 94.5GB | J |
(1) Subtask of research paper classification (2) Subtask of technical trend map creation |
1 | ||||||
abstract | Patent Abstracts of Japan(PAJ) (A) | E | 1993- 2002 |
3,496,252 | ca.5GB | |||||||||||
patent full | Patent grant data published from USPTO (A) | E | 1993- 2002 |
1,315,470 | 52.6GB | |||||||||||
E |
(1) Subtask of research paper classification (2) Subtask of technical trend map creation |
1 | ||||||||||||||
sci. abstract | ntc1-je (A) | JE | 1988- 1997 |
339,483 | 577MB | |||||||||||
ntc1-j (A) | J | 332,918 | 312MB | |||||||||||||
ntc1-e (A) | E | 187,080 | 218MB | |||||||||||||
ntc2-j (A) | J | 1986- 1999 *2 |
400,248 | 600MB | ||||||||||||
ntc2-e (A) | E | 134,978 | 200MB | |||||||||||||
Patent Trans lation |
NTCIR-7 PATMT |
MT | patent full | Publication of unexamined patent application (A) | J | 1993- 2002 |
3,496,252 | 94.5GB | J | Test Data (J): Intrinsic 1381 sent. Reference translation (E): 1381 sent. + 300 sent. * 2 humans |
J E |
Training data: 1,798,571 sent pairs |
- | |||
E | Test Data (E): Intrinsic 1381 sent. Reference translation (J): 1381 sent. |
|||||||||||||||
Patent grant data published from USPTO (A) | E | 1993- 2002 |
1,315,470 | 52.6GB | - | |||||||||||
E | Test Data (E): Extrinsic 124 claims |
2 levels |
||||||||||||||
NTCIR-8 PATMT |
MT | patent full | Publication of unexamined patent application (A) | J | 1993- 2007 |
5,253,613 | 165.0GB | J | Test Data (J): Intrinsic 1251 sent. Reference translation (E): 1251 sent. + 300 sent. * 3 humans |
J E |
Training data: 3,186,284 sent pairs |
- | ||||
E | Test Data (E): Intrinsic 1119 sent. Reference translation (J): 1119 sent. |
- | ||||||||||||||
Patent grant data published from USPTO (A) | E | 1993- 2007 |
2,124,370 | 120.6GB | ||||||||||||
E | Extrinsic 91 claims |
1 level |
||||||||||||||
QA |
NTCIR-3 QA | QA | News | Mainichi (B) | J | 1998- 1999 |
220,078 | 260MB | J *1 | 1200 | exact answer | |||||
NTCIR-4 QA | QA | News | Mainichi (B) | J | 1998- 1999 |
220,078 | ca. 776MB |
J *1 | 197 | exact answer | ||||||
199 | ||||||||||||||||
Yomiuri (B) | 373,558 | 251 | ||||||||||||||
NTCIR-5 CLQA | QA | News | CIRB040r (A) | C | 2000- 2001 |
901,446 | 581.7MB | C J E |
smpl:300, test:200*6 | 3 grades *7 |
||||||
Yomiuri (B) | J | 658,719 | 343.3MB | |||||||||||||
Daily Yomiuri (B) | E | 17,741 | 22.9MB | |||||||||||||
NTCIR-5 QA | QA | News | Mainichi (B) | J | 2000- 2001 |
199,681 | 260MB | J *1 | 50 series (360Q) |
graded | ||||||
NTCIR-6 CLQA |
QA | News | CIRB020 (A) | Ct | 1998- 1999 |
249,203 | 320MB | C J E |
J-E/J-J/E-J: 200, C-E/C-C/E-C/E-E: 150 |
3 grades *7 |
||||||
Mainichi (B) | J | 220,078 | 282MB | |||||||||||||
EIRB010 (C) | E | 10,204 | 24.5MB | |||||||||||||
Mainichi Daily (A) | 12,723 | 33.3MB | ||||||||||||||
Korea Times (A) | 19,599 | 55.8MB | ||||||||||||||
Hong Kong Standard (A) | 96,683 | 252MB | ||||||||||||||
NTCIR-6 QA | QA | News | Mainichi (B) | J | 1998- 2001 |
419,759 | 535MB | J | 100Q (any kind of Q) |
graded (3 types, 4 levels) |
||||||
NTCIR-7 ACLIA (CCLQA) |
QA | News | CIRB020 (A) | Ct | 1998- 1999 |
249,508 | 320 MB | C J E |
EN-JA: 100 JA-JA: 100 EN-CS: 100 CS-CS: 100 EN-CT: 100 CT-CT: 100 |
Binary decision (system response conceptually containing the nugget or not) |
||||||
CIRB040r (A) | 2000- 2001 |
901,446 | 582 MB | |||||||||||||
Lianhe Zaobao (A) | Cs | 1998- 2001 |
249,287 | 411 MB | ||||||||||||
Xinhua Chinese (B) | 295,875 | 511 MB | ||||||||||||||
Mainichi (B) | J | 419,759 | 544 MB | |||||||||||||
NTCIR-8 ACLIA (CCLQA) |
QA | News | Xinhua Chinese (B) | Cs | 2002- 2005 |
308,845 | - | C J E |
100 for each language pair | Binary pyramid nugget matching | ||||||
UDN (A) | Ct | 1,663,517 | - | |||||||||||||
Mainichi (B) | J | 377,941 | - | |||||||||||||
WEB | NTCIR-3 WEB | IR | Web (html/ text) |
NW100G-01 (A) | m*4 | crawled in 2001 |
11,038,720 | 100GB | J *1 | 47 | 4 grades + relative |
|||||
NW10G-01 (A) | 1,445,466 | 10GB | ||||||||||||||
NTCIR-4 WEB | IR | Web (html/ text) |
NW100G-01 (A) | m*4 | crawled in 2001 |
11,038,720 | 100GB | J *1 | - | 3 grades |
||||||
NTCIR-5 WEB | IR | Web (html/ text) |
NW1000G-04 (A) | m*4 | crawled in 2004 |
98,870,352 | 1.36TB | J *1 | 269+847 | 3 grades |
||||||
MuST (Trend Inform ation) |
NTCIR-6 MuST |
IE/ analysis |
News | Mainichi (B) | J | 1998- 1999 |
220,078 | 260MB | J | 27 | 581 *9 |
- | ||||
NTCIR-7 MuST |
IE/ analysis |
News | Mainichi (B) | J | 1998- 2001 |
419,759 | 535MB | J | 25 (8topics) |
701 *9 |
- |
collection | task | documents | summaries | ||||||
genre | filename | lang | year | # of doc | types | analysts | total# | ||
NTCIR-2 SUMM | single doc | news | Mainichi(B) | J | 1994.1995.1998 | 180 doc | 7 | 3 | 3780 |
NTCIR-2 TAO*10 | Mainichi(B) | 1998 | 1000 doc | 2 | 1 | 2000 | |||
NTCIR-3 SUMM | Mainichi(B) | 1998-1999 | 60 docs | 7 | 3 | 1260 | |||
multi doc | 50 sets | 2 | 3 | 300 |
*10: Distribution of NTCIR-2 SUMM TAO (Text Summarization) is currently
unavailable. We will announce through the ntcir Mailing list once it becomes
available again.
(A) | the document collections available from NII for research purpose | |||||
(B) | the document collections available for task participants for free, and available for research purpose use other than NTCIR participation from other party with fee |
|||||
(C) | the document collections available for task participants only |