NTCIR (NII Test Collection for IR Systems) Project bNTCIRbCONTACT INFORMATIONbNIIb
NTCIR HOME

NTCIR-7 HOME
NTCIR-7 MEETING
Clusters/tasks
  1. Advanced CLIA
  2. User Generated Contents
  3. Focused Domains
  4. Cluster Independent
- HOW TO PARTICIPATE
- DATA
- IMPORTANT DATES
- USER AGREEMENT FORMS
- CONTACT INFORMATION
- MAILING LISTS
NTCIR HOME

The 7th NTCIR Workshop

DATA


[NTCIR-7 HOME]

NTCIR-7 is over. For information on data see the NTCIR data page.

The following documents collections are used for the 7th NTCIR Workshop. They are available for the participating research groups free of charge for the task participation and system evaluation within the 7th NTCIR Workshop. To obtain the data, the signed user agreement forms must be submitted to the NTCIR Project Office at the NII.

cluster task test
collection
data
genre/
task
language file name Distribution
Data
number of documents (size) year
Advanced
CLIA
CCLQA/
IR for QA
Document
Data
news articles Ct CIRB040r( United Daily News, United Express, Ming Hseng News, Economic Daily News) done 901,446 00-01
Ct Lianhe Zaobao done - 98-01
Xinhua Chinese**a done -
J Mainichi*B done 419,759 98-01
Task Data QA CtCs
JE
NTCIR-7 ACLIA QA data - - -
IR CtCs
JE
NTCIR-7 ACLIA IR data - - -
Document
Data for system training purposes
news articles Ct
CIRB011(China Times, Commercial Times, China Times Express, Central Daily News, China Daily News) done 132,173 98-99
CIRB020( United Daily News, Economic Daily News, Min Sheng Daily, United Evening News, Star News) done 249,508 98-99
Task Data
for system training purposes
QA CJE NTCIR-5/6 CLQA data 5CLQA
done
- -
QA J NTCIR-3/4/5/6 QA data 3/4/5QA
done
IR CtK
JE
NTCIR-3/4/5/6 CLIR data done
User Generated Contents MOAT Document
Data
news articles J Mainichi*B done 419,759 98-01
E Mainichi Daily 24,878 98-01
Korea Times 50,129 98-01
Hong Kong Standard 96,683 98-99
Xinhua**a 406,791 98-01
Straits Times - 98-01
Ct CIRB011(China Times, Commercial Times, China Times Express, Central Daily News, China Daily News) done 132,173 98-99
CIRB020( United Daily News, Economic Daily News, Min Sheng Daily, United Evening News, Star News) done 249,508 98-99
Cs Xinhua Chinese**a done - 98-01
Lianhe Zaobao done - 98-01
MOAT:
Task Data
IE/
analysis
J NTCIR-7MOAT Japanese Annotation Data
(Mainichi*B 1998-2001)
9/1 - 98-01
E NTCIR-7MOAT English Annotation Data
(Mainichi Daily 1998-2001, Korea Times 2000-2001, Hong Kong Standard 1998-1999, Xinhua 98-01, Straits Times 98-01)
9/1 - 98-01
Ct NTCIR-7MOAT Chinese (traditional) Annotation Data
(CIRB020 1998-1999, CIRB040 2000-2001
9/1 - 98-01
Cs NTCIR-7MOAT Chinese (simplified) Annotation Data
(Xinhua Chinese 98-01, Lianhe Zaobao 98-01)
9/1 - 98-01
MOAT:
Task Data for system training purposes
IE/
analysis
J NTCIR-6 OAT JapaneseAnnotation Data:Part A
(Mainichi*B 1998-2001)
done 490 98-01
98-01
E NTCIR-6 OAT English Annotation Data:Part A
(Mainichi Daily 1998-2001, Korea Times 2000-2001, Hong Kong Standard 1998-1999)
done 439 98-01
98-01
NTCIR-6 OAT English Annotation Data:Part B
(Xinhua 1998-2001)
**a
done 98-01
Ct NTCIR-6 OAT Chinese(traditional) Annotation Data
(CIRB020 1998-1999, CIRB040 2000-2001)
done 843 98-01
Focused
Domains
Patent Translation Document
Data
- JE Patent Parallel Corpus done - 93-02
patent full J

Publication of unexamined patent applications

done 3,496,252
( 94.5GB)
93-02
patent abstract E

Patent Abstracts of Japan (PAJ)

done 3,496,252
(ca.5GB)
93-02
patent full E Patent grant data published from USPTO done 981,948 93-02
Patent Mining Document Data patent full J

Publication of unexamined patent applications

done 3,496,252
( 94.5GB)
93-02
patent abstract E

Patent Abstracts of Japan (PAJ)

done 3,496,252
(ca.5GB)
93-02
patent full E Patent grant data published from USPTO done 981,948 93-02
sci.
abstract
JE NTCIR-1 done 861,481 88-97
sci.
abstract
JE NTCIR-2 done 535,226 86-99
*C
Cluster Independent MuST Document Data news articles J Mainichi*B done - 98-01
Task Data IE/
analysis
J MuST Dataset - - -


*B:The data defined as 'Mainichi Newspaper Full-text Article Database CD-ROMs' in the memorandum will be delivered by sending the e-mail describing how to download it,�@not by sending CD/DVD-ROMs.
*C: gakkai subfiles:1997-1999, kaken subfiles: 1986-1997

1: For the details of the task data (topics and relevance judgments, questions and answers, summaries, etc), please visit the webpages of each task.

2: For the data with **, the procedure to obtain the data is specified.
**a: The data will be delivered from LDC for the Workshop participants who submit an additional user agreement form to LDC.

3: Please notice that the document collections shall be used for the purpose of accomplishing tasks set out in the NTCIR Workshop and for the purpose of research related to the tasks. The documents can not be used for "information purpose".


Last Modified:2008.08.27