NTCIR (NII Test Collection for IR Systems) Project NTCIRCONTACT INFORMATIONNII
NTCIR HOME

NTCIR-8 HOME
NTCIR-8 MEETING
TASK DESCRIPTION
・TASK INFORMATION
ACLIA
GeoTime
MOAT
PAT-MN
PAT-MT
PILOT TASK
HOW TO PARTICIPATE
DATA
IMPORTANT DATES
USER AGREEMENT FORMS
CONTACT INFORMATION
MAILING LISTS
ONLINE PROCEEDINGS
NTCIR HOMEへ

The 8th NTCIR Workshop

detailed table of
DATA
DATA at a glancedetailed table of DATA


[Japanese]

NTCIR-8 Test Collections: Documents

The following documents collections are used for the 8th NTCIR Workshop. They are available for the participating research groups free of charge for the task participation and system evaluation within the 8th NTCIR Workshop (*1). To obtain the data, the signed user agreement forms must be submitted to the NTCIR Project Office at the NII.

*1: Nominal shipping cost (about US$50) may be required for the data provided by the LDC.

task test
collection
data
genre/
task
language file name Distribution
Data
number of documents (size) year
ACLIA Document
Data
news articles Ct
UDN2002-2005 01-07-09 1,663,517 02-05
Cs Xinhua Chinese**a 01-07-09 308,845 02-05
J Mainichi 01-07-09 - 02-05
Task Data QA CtCs
JE
NTCIR-8 ACLIA QA data - - -
IR CtCs
JE
NTCIR-8 ACLIA IR data - - -
Document
Data for system training purposes
news articles Ct
CIRB011(China Times, Commercial Times, China Times Express, Central Daily News, China Daily News) 01-07-09 132,173 98-99
CIRB020( United Daily News, Economic Daily News, Min Sheng Daily, United Evening News, Star News) 01-07-09 249,508 98-99
CIRB040r( United Daily News, United Express, Ming Hseng News, Economic Daily News) 01-07-09 901,446 00-01
Cs Lianhe Zaobao 01-07-09 249,287 98-01
Xinhua Chinese**a 01-07-09 295,875
J Mainichi 01-07-09 419,759 98-01
Task Data
for system training purposes
QA CJE NTCIR-5/6 CLQA data 01-07-09 - -
J NTCIR-3/4/5/6 QA data 01-07-09
IR CtK
JE
NTCIR-3/4/5/6 CLIR data 01-07-09
QA/
IR
CtCs
JE
NTCIR-7 ACLIA CCLQA/IR4QA data 01-07-09 - -
GeoTime Document
Data
news articles J Mainichi 01-07-09 - 02-05
E New York Times
*B **a
01-07-09 315,417 02-05
Task Data
IR JE - - - -
MOAT Document
Data
news articles J Mainichi 01-07-09 - 02-05
E New York Times
*B **a
01-07-09 315,417 02-05
Ct UDN2002-2005 (United Daily News, United Express, Ming Hseng News, Economic Daily News, Star News) 01-07-09 1,663,517 02-05
Cs Xinhua Chinese**a 01-07-09 308,845 02-05
Task Data IE/
analysis
J NTCIR-8MOAT Japanese Annotation Data
(Mainichi
2002-2005)
01-10-09
- 02-05
E NTCIR-8MOAT English Annotation Data
(New York Times 2002-2005
)**a
01-10-09 - 02-05
Ct NTCIR-8MOAT Chinese (traditional) Annotation Data
(UDN2002-2005)
01-10-09 - 02-05
Cs NTCIR-8MOAT Chinese (simplified) Annotation Data
(Xinhua Chinese 2002-2005)**a
01-10-09 - 02-05
Task Data for system training purposes IE/
analysis
J NTCIR-6 OAT JapaneseAnnotation Data:Part A
(Mainichi 1998-2001)
01-07-09 490 98-01
98-01
NTCIR-7MOAT Japanese Annotation Data
(Mainichi 1998-2001)
01-07-09 287 98-01
E NTCIR-6 OAT English Annotation Data:Part A
(Mainichi Daily 1998-2001, Korea Times 2000-2001, Hong Kong Standard 1998-1999)
01-07-09 439 98-01
98-01
NTCIR-6 OAT English Annotation Data:Part B
(Xinhua 1998-2001)

**a
01-07-09 98-01
NTCIR-7MOAT English Annotation Data:Part A
(Mainichi Daily 1998-2001, Korea Times 2000-2001, Hong Kong Standard 1998-1999, Straits Times 98-01)
01-07-09 167 98-01
NTCIR-7MOAT English Annotation Data:Part B
(Xinhua English 98-01)
**a
01-07-09 98-01
Ct NTCIR-6 OAT Chinese(traditional) Annotation Data
(CIRB020 1998-1999, CIRB040 2000-2001)
01-07-09 843 98-01
NTCIR-7MOAT Chinese (traditional) Annotation Data
(CIRB020 1998-1999, CIRB040 2000-2001
01-07-09 246 98-01
Cs NTCIR-7MOAT Chinese (simplified) Annotation Data: Part A
(Lianhe Zaobao 98-01)
01-07-09 271 98-01
NTCIR-7MOAT Chinese (simplified) Annotation Data: Part B
(Xinhua Chinese 98-01)
**a
01-07-09 98-01
Patent Mining Document Data patent full J

Publication of unexamined patent applications

01-07-09 3,496,252
( 94.5GB)
93-02
patent abstract E

Patent Abstracts of Japan (PAJ)

01-07-09 3,496,252
(ca.5GB)
93-02
patent full E Patent grant data published from USPTO 01-07-09 981,948 93-02
sci.
abstract
JE NTCIR-1 01-07-09 861,481 88-97
sci.
abstract
JE NTCIR-2 01-07-09 535,226 86-99
*A
Task Data Mining JE NTCIR-8 Patent Mining Task Data:
Research papers Classification susbtask and Technical Trend Map Subtask
01-12-09 - -
Patent Translation Document
Data
patent full J

Publication of unexamined patent applications

01-07-09 - 93-07
patent full E Patent grant data published from USPTO 01-07-09 - 93-07
Task Data MT JE NTCIR-8 Patent Translation Task Data:
Translation subtask, Cross-Lingual Information Retrieval subtask and Evaluation subtask
- - -


*A: gakkai subfiles:1997-1999, kaken subfiles: 1986-1997
*B: There is a smaller amount of documents from Feb. 2003 to May 2004 and there is no data from June 2004.

1: For the details of the task data (topics and relevance judgments, questions and answers, summaries, etc), please visit the webpages of each task.

2: For the data with **, the procedure to obtain the data is specified.
**a: The data will be delivered from LDC for the Workshop participants who submit an additional user agreement form to LDC.

3: Please notice that the document collections shall be used for the purpose of accomplishing tasks set out in the NTCIR Workshop and for the purpose of research related to the tasks. The documents can not be used for "information purpose".


Last Modified:2009.11.24