NTCIR Project
user agreement
Research Purpose Use of NTCIR Test Collections
[JAPANESE] [NTCIR Home]
The below are the test collections that have been constructed and used
for the NTCIR. They are usable only for the research purpose use.
The documents collections included in the test collections were provided
to NII for used in NTCIR free of charge or for a fee. The providers of
the document data kindly understand the importance of the test collection
in the research on information access technologies and then granted the
use of the data for research purpose. Please remember that the document
data in the NTCIR test collections is copyrighted and has commercial value
as data. It is important for our continued reliable and good relationship
with the data producers/providers that we researchers must behave as a
reliable partners and use the data only for research purpose under the
user agreement and use them carefully not to violate any rights for them
.
To obtain the NTCIR Collection
The followings are the procedures to obtain the test collections. The test
collections and data available from NII are free of charge.
- The application form of the test collection must be filled out and sent by E-mail to ntc-secretariat
- Depending on the types of the data set, either a user agreement (memorandum) or a formal application is required. Please refer the list below for the required documents.
- User Agreement (memorandum on Permission to Use Test Collection)
- The user agreement form for each test collection that you would like to obtain must be filled out and sent by postal mail or courier to the address below.
- Please download and make two copies of the form in double-sided print.
- Signatures are needed on both agreement forms.
- After counter-signed by NII side, one copy of the form will be sent to
you and one copy will be kept by the NII.
Formal Application
- You can apply for different dataset by one application. One copy of the
formal application must be downloaded, filled out and sent by postal mail or courier to the Address below.
- After review in the NII, the permission of use of the data will be sent
to the applicant.
Address
NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi Chiyoda-ku, Tokyo
102-8430, JAPAN
PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat
Mailing List
The release of the new test collections and correction information shall
be announced through the ntcir@nii.ac.jp Mailing list
(For The Use of the NII Open Laboratory, please consult "NII Open Laboratory".)
The Data Description and User Agreement Forms for the NTCIR Workshop participants are here.)
All kind of the data for Research Purpose Use
NTCIR-1
- NTCIR-1(IR and Term Extraction/Role Analysis Test Collections) [Test Collection list ]
- The IR Test collection includes (1) Document data (Author abstracts of
the Academic Conference Paper Database (1988-1997) = author abstracts of
the paper presented at the academic conference hosted by either of 65 academic
societies in Japan. about 330,000 documents; more than half are English-Japanese
paired), (2) 83 Search topics (Japanese), and (3) Relevance Judgements.
The collection can be used for retrieval experiments of Japanese text retrieval
and CLIR of search Either of English documents or Japanese-English documents
by Japanese topics. The Term Extraction Test collection includes tagged
corpus using the 2000 Japanese documents selected from the above IR test
collection. The whole test collection is available for research purpose
use from NII.
- Application Form [ txt ]
- User agreement form [ PDF ]
- Readme for the CD-ROM [ txt ]
NTCIR-2
- NTCIR-2 (IR Test Collection) [ Test Collection list ]
- The collection includes (1) Document data (Author abstracts of the Academic
Conference Paper Database (1997-1999) and Grant Reports (1988-1997) = about
400,000 Japanese and 130,000 English documents), (2) 49 Search topics (Japanese
and English), and (3) Relevance Judgements. The whole test collection is
available for research purpose use from NII For experiments, the document
data must be used with those of the NTCIR-1. Relevance judgments were done
of the merged database of NTCIR-1 and NTCIR-2. To merge document collections,
the document IDs in the NTCIR-1 must be converted using the script included
in the NTCIR-2 CD-ROM. At the Second NTCIR Workshop, segmented data, in
which the whole document data were segmented into terms (short units as
well as longer units) using the standard software for segmentation in the
year of 2000. Those who are interested in the segmented data, please contact
ntc-secretariat@nii.ac.jp.
- NTCIR-2 SUMM (Text Summarization Test Collection) [ Test Collection list ]
- The collection includes (1) Document data (Japanese newspaper articles
Mainichi Newspaper (1994, 1995, 1998), and (2) Model Summaries (for each
of 180 documents, 7 types of single document summaries prepared in different
length by different strategies were prepared by 3 analysts). The Summaries
are available from NII. The document data is available from Mainichi Newspaper Co..
- Topics and Relevance judgments
- Application Form [txt]
- Formal Application [PDF]
- Readme for the data set [ txt ]
- NTCIR-2 SUMM TAO (Text Summarization) [ Test Collection list ]
- Distribution of NTCIR-2 SUMM TAO (Text Summarization) is currently unavailable.
We will announce through the ntcir@nii.ac.jp Mailing list once it becomes available again.
NTCIR-3
- NTCIR-3 CLIR: IR/CLIR Test Collection [ Test Collection list ]
- The collection includes (1) Document data (Mainichi Newspaper 1998-1999
(Japanese), CIRB011+CIRB020 (Chinese News articles publish in Taiwan in
1998-1999), Mainichi Daily 1998-1999 (English Newspaper published in Japan),
EIRB010 (English News articles published in Taiwan in 1998-1999, and Korean
Economic Daily 1994 (Korean Newspaper), (2) 50 Search topics for 1998-1999
Collections and 30 topics for 1994 collections (Chinese, Korean, Japanese
and English), and (3) Relevance Judgements. The Topics and Relevance Judgments
are available from NII. The document data is re-used in NTCIR-4 CLIR as well. The Japanese document data is available from Mainichi Newspaper Co. Mainichi Daily is available from NII. Other document data is available
NTCIR Workshop participants only. Please notice that topics and relevance
judgments usable for retrieval experiments vary according to the document
data set to be retrieved. For details, please consult README.
- Topics and Relevance Judgments
- Mainichi Daily 1998-1999
- application form
- user agreement
- NTCIR-3 PATENT (IR Test Collection) [ Test Collection list ]
- The collection includes (1) Document data (Japanese Patent Application
fulltext 1998-1999 JAPIO Japanese abstracts (1995-1999) and PAJ English
Abstract (1995-1999), (2) 30 Search topics (Japanese and translation to
Traditional Chinese, Simplified Chinese, Korean and English), and (3) Relevance
Judgements. JAPIO Abstract and PAJ Abstracts are exactly translated pairs.
Document sizes are 18GB for fulltext and 4GB for Abstracts. NTCIR-4 PATENT
used Patent Application fulltext 1993-2002 and PAJ 1993-2002 but it includes
small number of inconsistent document data. Each topic for NTCIR-3 PATENT
includes a related newspaper article, and the collection is usable for
Cross-Genre experiments in which patents were retrieved by a newspaper
clip as well as ordinary ad hoc retrieval of patents by topics. For CLIR
experiments, using JAPIO abstracts and PAJ abstracts of 1995-1997 only
to extract translation knowledge is strongly recommended. The whole test
collection is available for research purpose use from NII.
- retrieval task test collection
- Application Form [txt]
- User agreement form [PDF]
- README for NTCIR-3 PATENT [PDF]
- QAC : NTCIR-3 QA: Question Answering Test Collection [ Test Collection list ]
- The collection includes (1) Document data (Mainichi Newspaper 1998-1999
(Japanese)), (2) about 1200 questions (Japanese and English translation),
and (3) Answers. The Questions and Answers are available from NII. The
document data is re-used in NTCIR-4 QAC as well. The document data is available from Mainichi Newspaper Co.
- Questions and Answers Data:
- Application Form [txt]
- Formal Application [PDF]
- The terms of use [PDF]
- Notices : [PDF]
- README for NTCIR-3 QA [txt]
- TSC : NTCIR-3 SUMM: (Text Summarization Test Collection) [Test Collection list ]
- The collection includes (1) Document data (Japanese newspaper articles
Mainichi Newspaper (1998-1999), and (2) Model Summaries. Summary data consists
of (2i) Single document summaries (Each of 60 documents, 7 types of single
document summaries prepared in different length by different strategies
were prepared by 3 analysts) and (2ii) Multi-document summaries (Each of
50 document collections, 2 types of length of summaries were prepared by
3 analysis. The topics of the document collections were given). The Summaries
are available from NII. The document data is available from Mainichi Newspaper Co.
- NTCIR-3 WEB (Web Retrieval Test Collection) [Test Collection list ]
- The collection includes (1) Document data (html and plain-text files mainly
crawled from ".jp" domain. Most of them are written in Japanese
or English, but some are in other languages. The size is 100GB), (2) 47
search topics (Japanese and English translation), and (3) Relevance judgements
(The "One-click distance model" or "Page-unit document model"
are used in relevance judgements. Relevance judgments on 10GB document
sub-collection are also avaialble.) The whole test collection is available
for research purpose use from the National Institute of Informatics (NII).
The separate application for the document data ("NW100G-01")
and for the topics and relevance judgments are needed.
- (The former restriction, under which the users were permitted to access
and process the Document data only in the "Open Laboratory",
has been abolished.)
- Document data ("NW100G-01")
- Application Form [txt]
- User agreement form [PDF]
- Data contents : Refer to Section 3 of this paper.
- Topics and Relevance judgments
- Application Form [txt]
- Formal Application [PDF]
- The terms of use [PDF]
- README for NTCIR-3 WEB Topics and Relevance judgments data of main tasks
[txt] and of Speech-Driven Retrieval Sub-task [txt]
NTCIR-4 [ Test Collection list ]
[JAPANESE] [NTCIR Home] [Top of this page]
Last updated : 2006-07-07
2004-07-14
Updated on: 2003-09-09
ntc-admin