|
The 7th NTCIR Workshop
DATA
[NTCIR-7 HOME]
NTCIR-7 is over. For information on data see the NTCIR data page.
The following documents collections are used for the 7th NTCIR Workshop.
They are available for the participating research groups free of charge
for the task participation and system evaluation within the 7th NTCIR Workshop.
To obtain the data, the signed user agreement forms must be submitted to the NTCIR Project Office at the NII.
cluster |
task |
test
collection |
data |
genre/
task |
language |
file name |
Distribution
Data |
number of documents (size) |
year |
Advanced
CLIA |
CCLQA/
IR for QA |
Document
Data |
news articles |
Ct |
CIRB040r( United Daily News, United Express, Ming Hseng News, Economic
Daily News) |
done |
901,446 |
00-01 |
Ct |
Lianhe Zaobao |
done |
- |
98-01 |
Xinhua Chinese**a |
done |
- |
J |
Mainichi*B |
done |
419,759 |
98-01 |
Task Data |
QA |
CtCs
JE |
NTCIR-7 ACLIA QA data |
- |
- |
- |
IR |
CtCs
JE |
NTCIR-7 ACLIA IR data |
- |
- |
- |
Document
Data for system training purposes
|
news articles |
Ct
|
CIRB011(China Times, Commercial Times, China Times Express, Central Daily News, China Daily News) |
done |
132,173 |
98-99 |
CIRB020( United Daily News, Economic Daily News, Min Sheng Daily, United
Evening News, Star News) |
done |
249,508 |
98-99 |
Task Data
for system training purposes |
QA |
CJE |
NTCIR-5/6 CLQA data |
5CLQA
done |
- |
- |
QA |
J |
NTCIR-3/4/5/6 QA data |
3/4/5QA
done |
IR |
CtK
JE |
NTCIR-3/4/5/6 CLIR data |
done |
User Generated Contents |
MOAT |
Document
Data
|
news articles |
J |
Mainichi*B |
done |
419,759 |
98-01 |
E |
Mainichi Daily |
24,878 |
98-01 |
Korea Times |
50,129 |
98-01 |
Hong Kong Standard |
96,683 |
98-99 |
Xinhua**a |
406,791 |
98-01 |
Straits Times |
- |
98-01 |
Ct |
CIRB011(China Times, Commercial Times, China Times Express, Central Daily News, China Daily News) |
done |
132,173 |
98-99 |
CIRB020( United Daily News, Economic Daily News, Min Sheng Daily, United
Evening News, Star News) |
done |
249,508 |
98-99 |
Cs |
Xinhua Chinese**a |
done |
- |
98-01 |
Lianhe Zaobao |
done |
- |
98-01 |
MOAT:
Task Data |
IE/
analysis |
J |
NTCIR-7MOAT Japanese Annotation Data
(Mainichi*B 1998-2001) |
9/1 |
- |
98-01 |
E |
NTCIR-7MOAT English Annotation Data
(Mainichi Daily 1998-2001, Korea Times 2000-2001, Hong Kong Standard 1998-1999,
Xinhua 98-01, Straits Times 98-01) |
9/1 |
- |
98-01 |
Ct |
NTCIR-7MOAT Chinese (traditional) Annotation Data (CIRB020 1998-1999, CIRB040 2000-2001
|
9/1 |
- |
98-01 |
Cs |
NTCIR-7MOAT Chinese (simplified) Annotation Data
(Xinhua Chinese 98-01, Lianhe Zaobao 98-01) |
9/1 |
- |
98-01 |
MOAT:
Task Data for system training purposes |
IE/
analysis |
J |
NTCIR-6 OAT JapaneseAnnotation Data:Part A
(Mainichi*B 1998-2001) |
done |
490 |
98-01 |
98-01 |
E |
NTCIR-6 OAT English Annotation Data:Part A
(Mainichi Daily 1998-2001, Korea Times 2000-2001, Hong Kong Standard 1998-1999) |
done |
439 |
98-01 |
98-01 |
NTCIR-6 OAT English Annotation Data:Part B
(Xinhua 1998-2001)
**a |
done |
98-01 |
Ct |
NTCIR-6 OAT Chinese(traditional) Annotation Data
(CIRB020 1998-1999, CIRB040 2000-2001) |
done |
843 |
98-01 |
Focused
Domains |
Patent Translation |
Document
Data |
- |
JE |
Patent Parallel Corpus |
done |
- |
93-02 |
patent full |
J |
Publication of unexamined patent applications
|
done |
3,496,252
( 94.5GB) |
93-02 |
patent abstract |
E |
Patent Abstracts of Japan (PAJ)
|
done |
3,496,252
(ca.5GB) |
93-02 |
patent full |
E |
Patent grant data published from USPTO |
done |
981,948 |
93-02 |
Patent Mining |
Document Data |
patent full |
J |
Publication of unexamined patent applications
|
done |
3,496,252
( 94.5GB) |
93-02 |
patent abstract |
E |
Patent Abstracts of Japan (PAJ)
|
done |
3,496,252
(ca.5GB) |
93-02 |
patent full |
E |
Patent grant data published from USPTO |
done |
981,948 |
93-02 |
sci.
abstract |
JE |
NTCIR-1 |
done |
861,481 |
88-97 |
sci.
abstract |
JE |
NTCIR-2 |
done |
535,226 |
86-99
*C |
Cluster Independent |
MuST |
Document Data |
news articles |
J |
Mainichi*B |
done |
- |
98-01 |
Task Data |
IE/
analysis |
J |
MuST Dataset |
- |
- |
- |
*B:The data defined as 'Mainichi Newspaper Full-text Article Database CD-ROMs'
in the memorandum will be delivered by sending the e-mail describing how
to download it,�@not by sending CD/DVD-ROMs.
*C: gakkai subfiles:1997-1999, kaken subfiles: 1986-1997
1: For the details of the task data (topics and relevance judgments, questions
and answers, summaries, etc), please visit the webpages of each task.
2: For the data with **, the procedure to obtain the data is specified.
**a: The data will be delivered from LDC for the Workshop participants who submit
an additional user agreement form to LDC.
3: Please notice that the document collections shall be used for the purpose
of accomplishing tasks set out in the NTCIR Workshop and for the purpose
of research related to the tasks. The documents can not be used for "information
purpose".
Last Modified:2008.08.27
|