NTCIR
Test Collections - DATA
[User Agreement] [NTCIR home]
NTCIR Test collections : IR & QA
collection |
task |
documents |
Task data |
genre |
filename |
lang |
year |
# of doc |
size |
topic/ question |
relevance judge |
lang |
# |
|
NTCIR-1 |
IR |
sci. abstract |
ntc1-je |
JE |
1988-1997 |
339,483 |
577MB |
J |
83 |
3 grades |
ntc1-j |
J |
332,918 |
312MB |
ntc1-e |
E |
187,080 |
218MB |
60 |
Term extraction/ role analysis |
ntc1-tmrc |
J |
2,000 |
- |
|
- |
- |
|
IR |
news |
CIRB010 |
Ct |
1998-1999 |
132220 |
132MB |
CtE |
50 |
4 grades |
NTCIR-2 |
IR |
sci. abstract |
ntc2-j |
J |
1986-1999** |
400,248 |
600MB |
JE |
49 |
4 grades |
ntc2-e |
E |
134,978 |
200MB |
NTCIR-3 CLIR |
IR |
news |
KEIB010 |
K |
1994 |
66,146 |
74MB |
CtKJE |
30 |
4 grades |
CIRB011 |
Ct |
1998-1999 |
132,173 |
870MB |
CtKJE |
50 |
4 grades |
CIRB020 |
249,508 |
Mainichi |
J |
220,078 |
EIRB010 |
E |
10,204 |
Mainichi Daily |
12,723 |
NTCIR-3 PATENT |
IR |
patent full |
kkh *3 |
J |
1998-1999 |
697,262 |
18GB |
CtCsKJE |
31 |
3 grades |
abstract |
jsh *3 |
1995-1999 |
1,706,154 |
1,883MB |
paj *3 |
E |
1,701,339 |
2,711MB |
QAC : NTCIR-3 QA |
QA |
news |
Mainichi |
J |
1998-1999 |
220,078 |
282MB |
J* |
1200 |
exact answer |
NTCIR-3 WEB |
IR |
Web (html/text) |
NW100G-01 |
multiple*4 |
crawled in 2001 |
11,038,720 |
100GB |
J* |
47 |
4 grades |
NW10G-01 |
1,445,466 |
10GB |
NTCIR-4 PATENT |
IR |
patent full-text |
kkh *3 |
J |
1993-2002 |
3,496,252 |
94.5GB |
CtCsKJE |
101 |
4 grades |
patent abstract |
paj *3 |
E |
1993-2002 |
3,496,252 |
5,482MB |
NTCIR-4 WEB |
IR |
Web (html/text) |
NW100G-01 |
multiple*4 |
crawled in 2001 |
11,038,720 |
100GB |
J* |
47 |
4 grades |
NW10G-01 |
J:Japanese, E:English, C:Chinese (Ct:Traditional Chinese, Cs: Simplified Chinese), K:Korean;
+ indicates the document collection newly added for NTCIR-4
* English translation is available
** gakkai subfiles: 1997-1999, kaken subfiles: 1986-1997
*3: kkh : Publication of unexamined patent application, jsh: Japanese abstract,
paj: English translation of jsh
*4: almost Japanese or English (some in other languages)
NTCIR Test collections : Summarization
collection |
task |
documents |
summaries |
genre |
filename |
lang |
year |
# of doc |
types |
analysts |
total# |
NTCIR-2 SUMM |
single doc |
news |
Mainichi |
J |
1994.1995.1998 |
180 doc |
7 |
3 |
3780 |
NTCIR-2 TAO |
1998 |
1000 doc |
2 |
1 |
2000 |
TSC:NTCIR-3 SUMM |
1998-1999 |
60 docs |
7 |
3 |
1260 |
multi doc |
50 sets |
2 |
3 |
300 |
J:Japanese
|
-- |
data is available from ‚m‚h‚h |
|
-- |
data is available NTCIR Workshop participants only |
|
-- |
data is available from Newspaper Co.(Mainichi,Yomiuri) |
[User Agreement] [NTCIR home]