|
NTCIR Project
Research Purpose Use of NTCIR Test Collections or Data Archive/ User Agreement
[Japanese]
The below are the test collections that have been constructed and used for the NTCIR. They are usable only for the research purpose use.
The documents collections included in the test collections were provided
to NII for used in NTCIR free of charge or for a fee. The providers of
the document data kindly understand the importance of the test collection
in the research on information access technologies and then granted the
use of the data for research purpose. Please remember that the document
data in the NTCIR test collections is copyrighted and has commercial value
as data. It is important for our continued reliable and good relationship
with the data producers/providers that we researchers must behave as a
reliable partners and use the data only for research purpose under the
user agreement and use them carefully not to violate any rights for them.
"Research Activities Report" and "Publication Report" should be submitted by the users of NTCIR Test Collections.
"Research Activities Report"
The form of Research Activities Report must be filled out and sent by E-mail to ntc-report
"Publication Report related to NTCIR"--> please refer to the page "To Publication Report related to NTCIR"and send by E-mail to ntc-bib
To obtain the NTCIR Collection
The followings are the procedures to obtain the test collections. The test collections and data available from NII are free of charge.
- The application form of the test collection must be filled out and sent by E-mail to ntc-secretariat. -->instructions
- Thereafter we will send the required documents.
- Depending on the types of the data set, either a user agreement (memorandum) or a formal application is required. Please refer the list below for the required documents.
- User Agreement (memorandum on Permission to Use Test Collection)
- The user agreement form for each test collection that you would like to obtain must be filled out and sent by postal mail or courier to the address below.
- Please download and make two copies of the form in double-sided print.
- Signatures are needed on both agreement forms.
- After counter-signed by NII side, one copy of the form will be sent to
you and one copy will be kept by the NII.-->instructions
Formal Application
- You can apply for different dataset by one application. One copy of the
formal application must be downloaded, filled out and sent by postal mail or courier to the Address below.
- After review in the NII, the permission of use of the data will be sent to the applicant.
- Some of the task data and documents data are available from "NII/IDR":
http://www.nii.ac.jp/dsc/idr/en/ntcir/ntcir.html
Terminate the Use
If you will terminate the use of the data, please notify NTCIR Project
office by E-mail at ntc-secretariat . Then all the data and secondry data derived from them must be deleted. One copy of its proof must be downloaded, filled out and sent by postal mail or courier to the Address below.--> `Cancellation of the Licensing of Data and Deletion of Data'
Address
NTCIR project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi Chiyoda-ku, Tokyo
101-8430, JAPAN
PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat
Mailing List
The release of the new test collections and correction information shall
be announced through the ntcir Mailing list: ntcir
To subscribe the NTCIR Mailing list, please refer this page:
http://research.nii.ac.jp/ntcir/ml-en.html.
(About ongoing NTCIR, please refer this page:
http://research.nii.ac.jp/ntcir/workshop/index.html)
All kind of the data for Research Purpose Use
NTCIR-1| NTCIR-2| NTCIR-3| NTCIR-4| NTCIR-5| NTCIR-6| NTCIR-7| NTCIR-8| NTCIR-9|
NTCIR-1 [ Detailed Table of Test Collections]
NTCIR-1(IR and Term Extraction/Role Analysis Test Collections)
NTCIR-2 [ Detailed Table of Test Collections]
NTCIR-2 (IR Test Collection)
NTCIR-2 SUMM TAO (Text Summarization)
NTCIR-3 [ Detailed Table of Test Collections]
NTCIR-3 CLIR: IR/CLIR Test Collection
NTCIR-3 PATENT (IR Test Collection)
NTCIR-3 QA: Question Answering Test Collection
NTCIR-3 SUMM: (Text Summarization Test Collection)
NTCIR-3 WEB (Web Retrieval Test Collection)
NTCIR-4 [ Detailed Table of Test Collections]
NTCIR-4 CLIR: Cross-Lingual Information Retrieval test collection
NTCIR-4 PATENT Retrieval test collection
NTCIR-4 QAC: Question Answering test collection
NTCIR-4 WEB Retrieval Test Collection
NTCIR-5 [ Detailed Table of Test Collections]
NTCIR-5 CLIR: Cross-Lingual Information Retrieval test collection
NTCIR-5 CLQA: Cross-Lingual Question Answering test collection
NTCIR-5 PATENT Retrieval test collection
NTCIR-5 QAC: Question Answering test collection
NTCIR-5 WEB Retrieval Test collection
NTCIR-6 [ Detailed Table of Test Collections]
NTCIR-6 CLIR: Cross-Lingual Information Retrieval test collection
NTCIR-6 CLQA: Cross-Lingual Question Answering test collection
NTCIR-6 OPINION Analysis test collection
NTCIR-6 PATENT Retrieval test collection
NTCIR-6 QAC: Question Answering test collection
NTCIR-6 MuST: Multimodal Summarization for Trend Information test collection
NTCIR-7 [ Detailed Table of Test Collections]
NTCIR-7 ACLIA: Advanced Cross-Lingual Information Retrieval and Question
Answering Test Collection
NTCIR-7 MOAT: Multilingual Opinion Analysis Task test collection
NTCIR-7 PATMN: Patent Mining test collection
NTCIR-7 PATMT: Patent Translation test collection
NTCIR-7 MuST: Multimodal Summarization for Trend Information test collection
NTCIR-8 [ Detailed Table of Test Collections]
NTCIR-8 ACLIA: Advanced Cross-Lingual Information Retrieval and Question
Answering Test Collection
NTCIR-8 GeoTime: Geo-temporal Information Retrieval test collection
NTCIR-8 MOAT: Multilingual Opinion Analysis Task test collection
NTCIR-8 PATMN: Patent Mining test collection
NTCIR-8 PATMT test collection ( Translation Subtask/ Evaluation Subtask)
NTCIR-8 CQA: Community QA test collection
NTCIR-9 [ Detailed Table of Test Collections]
NTCIR-9 CrossLink: Cross-lingual Link Discovery test collection
NTCIR-9 GeoTime: Geotemporal Information Retrieval test collection
NTCIR-9 INTENT: Intent test collection
NTCIR-9 1CLICK: One Click Access test collection
NTCIR-9 PatentMT: Patent Machine Translation test collection
NTCIR-9 RITE: Recognizing Inference in TExt test collection
NTCIR-9 SpokenDoc: IR for Spoken Documents test collection
NTCIR-1
- NTCIR-1(IR and Term Extraction/Role Analysis Test Collections)
- [Detailed Table of Test Collections]
- The IR Test collection includes (1) Document data (Author abstracts of
the Academic Conference Paper Database (1988-1997) = author abstracts of
the paper presented at the academic conference hosted by either of 65 academic
societies in Japan. about 330,000 documents; more than half are English-Japanese
paired,) (2) 83 Search topics (Japanese,) and (3) Relevance Judgements.
The collection can be used for retrieval experiments of Japanese text retrieval
and CLIR of search Either of English documents or Japanese-English documents
by Japanese topics. The Term Extraction Test collection includes tagged
corpus using the 2000 Japanese documents selected from the above IR test
collection. The whole test collection is available for research purpose
use from NII.
- Application Form [txt]
- User agreement form (sent by email)
- Readme for the CD-ROM [txt]
NTCIR-2
- NTCIR-2 (IR Test Collection)
- [Detailed Table of Test Collections]
- The collection includes (1) Document data (Author abstracts of the Academic
Conference Paper Database (1997-1999) and Grant Reports (1988-1997) = about
400,000 Japanese and 130,000 English documents,) (2) 49 Search topics (Japanese
and English,) and (3) Relevance Judgements. The whole test collection is
available for research purpose use from NII For experiments, the document
data must be used with those of the NTCIR-1. Relevance judgments were done
of the merged database of NTCIR-1 and NTCIR-2. To merge document collections,
the document IDs in the NTCIR-1 must be converted using the script included
in the NTCIR-2 CD-ROM. At the Second NTCIR Workshop, segmented data, in
which the whole document data were segmented into terms (short units as
well as longer units) using the standard software for segmentation in the
year of 2000. Those who are interested in the segmented data, please contact
ntc-secretariat@nii.ac.jp.
- Application Form [txt]
- User agreement form (sent by email)
- Readme for the CD-ROM [txt]
- NOTE: To display and print the English manual (in PDF form) of NTCIR-2 CD-ROM, you need to download and install Acrobat Reader 4.0 Asian Font Pack (Japanese) from
http://www.adobe.com/products/acrobat/cjkfontpack.html.
- NTCIR-2 SUMM (Text Summarization Test Collection)
- [Detailed Table of Test Collections]
- The collection includes (1) Document data (Japanese newspaper articles
Mainichi Newspaper (1994, 1995, 1998,) and (2) Model Summaries (for each
of 180 documents, 7 types of single document summaries prepared in different
length by different strategies were prepared by 3 analysts.) The Summaries
are available from NII. The document data is available from Mainichi Newspaper
Co..
- Topics and Relevance judgments
- Application Form [txt]
- Formal Application [PDF]
- Readme for the data set [txt]
- NTCIR-2 SUMM TAO (Text Summarization)
NTCIR-3
- NTCIR-3 CLIR: IR/CLIR Test Collection
- [Detailed Table of Test Collections]
- The collection includes (1) Document data (Mainichi Newspaper 1998-1999
(Japanese,) CIRB011+CIRB020 (Chinese News articles publish in Taiwan in
1998-1999,) Mainichi Daily 1998-1999 (English Newspaper published in Japan,)
EIRB010 (English News articles published in Taiwan in 1998-1999, and Korean
Economic Daily 1994 (Korean Newspaper,) (2) 50 Search topics for 1998-1999
Collections and 30 topics for 1994 collections (Chinese, Korean, Japanese
and English,) and (3) Relevance Judgements. The Topics and Relevance Judgments,
Mainichi Daily (English,) CIRB020 (Chinese) are available from NII. The
document data is re-used in NTCIR-4 CLIR as well. The Japanese document data is available from Mainichi Newspaper Co. Other document data is available NTCIR Workshop participants only. Please
notice that topics and relevance judgments usable for retrieval experiments
vary according to the document data set to be retrieved. For details, please
consult README.
- Topics and Relevance Judgments are downloadable from NII/IDR.
- If you will obtain the Test Collection (Document Data and Topics/Relevance
Judgments)
- Application Form [txt]
- User agreement form (sent by email)
- The terms of use [PDF]
- README for NTCIR-3 CLIR [dry run] [fomal run]
- NTCIR-3 PATENT (IR Test Collection)
- [Detailed Table of Test Collections]
- The collection includes (1) Document data (Japanese Patent Application
fulltext 1998-1999 JAPIO Japanese abstracts (1995-1999) and PAJ English
Abstract (1995-1999,) (2) 30 Search topics (Japanese and translation to
Traditional Chinese, Simplified Chinese, Korean and English,) and (3) Relevance
Judgements. JAPIO Abstract and PAJ Abstracts are exactly translated pairs.
Document sizes are 18GB for fulltext and 4GB for Abstracts. NTCIR-4 PATENT
used Patent Application fulltext 1993-2002 and PAJ 1993-2002 but it includes
small number of inconsistent document data. Each topic for NTCIR-3 PATENT
includes a related newspaper article, and the collection is usable for
Cross-Genre experiments in which patents were retrieved by a newspaper
clip as well as ordinary ad hoc retrieval of patents by topics. For CLIR
experiments, using JAPIO abstracts and PAJ abstracts of 1995-1997 only
to extract translation knowledge is strongly recommended. The whole test
collection is available for research purpose use from NII.
-
- retrieval task test collection
- Application Form [txt]
- User agreement form (sent by email)
- README for NTCIR-3 PATENT [PDF]
-
- QAC : NTCIR-3 QA: Question Answering Test Collection
- [Detailed Table of Test Collections]
- The collection includes (1) Document data (Mainichi Newspaper 1998-1999
(Japanese),) (2) about 1200 questions (Japanese and English translation,)
and (3) Answers. The Questions and Answers are available from NII. The
document data is re-used in NTCIR-4 QAC as well. The document data is available from Mainichi Newspaper Co.
-
- Questions and Answers Data are downloadable from NII/IDR.
- The terms of use [PDF]
- Notices : [PDF]
- README for NTCIR-3 QA [txt]
- TSC : NTCIR-3 SUMM: (Text Summarization Test Collection)
- [Detailed Table of Test Collections]
- The collection includes (1) Document data (Japanese newspaper articles
Mainichi Newspaper (1998-1999,) and (2) Model Summaries. Summary data consists
of (2i) Single document summaries (Each of 60 documents, 7 types of single
document summaries prepared in different length by different strategies
were prepared by 3 analysts) and (2ii) Multi-document summaries (Each of
50 document collections, 2 types of length of summaries were prepared by
3 analysis. The topics of the document collections were given.) The Summaries
are available from NII. The document data is available from Mainichi Newspaper
Co.
-
- Summaries
-
- NTCIR-3 WEB (Web Retrieval Test Collection)
- [Detailed Table of Test Collections ]
-
- The collection includes (1) Document data (html and plain-text files mainly
crawled from ".jp" domain. Most of them are written in Japanese
or English, but some are in other languages. The size is 100GB,) (2) 47
search topics (Japanese and English translation,) and (3) Relevance judgements
(The "One-click distance model" or "Page-unit document model"
are used in relevance judgements.) The whole test collection is available
for research purpose use from the National Institute of Informatics (NII.)
The separate application for the document data ("NW100G-01")
and for the topics and relevance judgments are needed.
- (The former restriction, under which the users were permitted to access
and process the Document data only in the "Open Laboratory,"
has been abolished.)
- NTCIR-3 Web Retrieval Test Collection are available from NII/IDR.
Document data ("NW100G-01")
Topics and Relevance judgments
- The terms of use [PDF]
- README for NTCIR-3 WEB Topics and Relevance judgments data of main tasks [txt] and of Speech-Driven Retrieval Sub-task [txt]
-
NTCIR-4 [Detailed Table of Test Collections]
NTCIR-5 [Detailed Table of Test Collections]
If you need more information, please visit our website.
NTCIR-5 CLIR test collection
NTCIR-5 CLQA test collection
NTCIR-5 PATENT test collection
NTCIR-5 QAC test collection
NTCIR-5 WEB Test collection
NTCIR-6 [Detailed Table of Test Collections]
If you need more information, please visit our website.
NTCIR-6 CLIR test collection
NTCIR-6 CLQA test collection
NTCIR-6 OPINION test collection
NTCIR-6 PATENT test collection
NTCIR-6 QAC test collection
NTCIR-6 MuST test collection
NTCIR-7 [Detailed Table of Test Collections]
If you need more information, please visit our website.
NTCIR-7 ACLIA test collection
NTCIR-7 MOAT test collection
NTCIR-7 PATMN test collection
NTCIR-7 PATMT test collection
NTCIR-7 MuST test collection
NTCIR-8 [Detailed Table of Test Collections]
If you need more information, please visit our website.
NTCIR-8 ACLIA test Collection
NTCIR-8 GeoTime test collection
NTCIR-8 MOAT test collection
NTCIR-8 PATMN test collection
NTCIR-8 PATMT test collection (Translation Subtask/Evaluation Subtask)
NTCIR-8 CQA test collection
NTCIR-9 [Detailed Table of Test Collections]
If you need more information, please visit our website.
NTCIR-9 CrossLink: Cross-lingual Link Discovery test collection
NTCIR-9 GeoTime: Geotemporal Information Retrieval test collection
NTCIR-9 INTENT: Intent test collection
NTCIR-9 1CLICK: One Click Access test collection
NTCIR-9 PatentMT: Patent Machine Translation test collection
NTCIR-9 RITE: Recognizing Inference in TExt test collection
NTCIR-9 SpokenDoc: IR for Spoken Documents test collection
[Japanese]
[NTCIR Home]
Updated: 2015-12-07
ntc-admin
|