NTCIR (NII Test Collection for IR Systems) Project Related URL'sContact InformationNII
NTCIR HOME

Search

HOME
About NTCIR
・WorkShop
NTCIR-11
NTCIR-10
NTCIR-9
NTCIR-8
NTCIR-7
NTCIR-6
NTCIR-5
NTCIR-4
NTCIR-3
NTCIR-2
NTCIR-1
Data/Tools
Publications/Online Proceedings
Related URL's
Mailing Lists
FAQ
Contact Information
PrivacyPolicy
NTCIR CMS HOME


| Test Collections| Submission Archives| Tools| User Agreements| Detailed Table of Test Collections|

NTCIR Project

Research Purpose Use of NTCIR Test Collections or Data Archive/ User Agreement

[Japanese]

The below are the test collections that have been constructed and used for the NTCIR. They are usable only for the research purpose use.
The documents collections included in the test collections were provided to NII for used in NTCIR free of charge or for a fee. The providers of the document data kindly understand the importance of the test collection in the research on information access technologies and then granted the use of the data for research purpose. Please remember that the document data in the NTCIR test collections is copyrighted and has commercial value as data. It is important for our continued reliable and good relationship with the data producers/providers that we researchers must behave as a reliable partners and use the data only for research purpose under the user agreement and use them carefully not to violate any rights for them.

"Research Activities Report" and "Publication Report" should be submitted by the users of NTCIR Test Collections.


"Research Activities Report"
The form of Research Activities Report must be filled out and sent by E-mail to ntc-report

"Publication Report related to NTCIR"--> please refer to the page "To Publication Report related to NTCIR"and send by E-mail to ntc-bib

To obtain the NTCIR Collection

The followings are the procedures to obtain the test collections. The test collections and data available from NII are free of charge.
  • The application form of the test collection must be filled out and sent by E-mail to ntc-secretariat. -->instructions
  • Thereafter we will send the required documents.
  • Depending on the types of the data set, either a user agreement (memorandum) or a formal application is required. Please refer the list below for the required documents.
    User Agreement (memorandum on Permission to Use Test Collection)
    The user agreement form for each test collection that you would like to obtain must be filled out and sent by postal mail or courier to the address below.
    Please download and make two copies of the form in double-sided print.
    Signatures are needed on both agreement forms.
    After counter-signed by NII side, one copy of the form will be sent to you and one copy will be kept by the NII.-->instructions
    Formal Application
    You can apply for different dataset by one application. One copy of the formal application must be downloaded, filled out and sent by postal mail or courier to the Address below.
    After review in the NII, the permission of use of the data will be sent to the applicant.

  • Some of the task data and documents data are available from "NII/IDR":
    http://www.nii.ac.jp/dsc/idr/en/ntcir/ntcir.html

Terminate the Use
If you will terminate the use of the data, please notify NTCIR Project office by E-mail at ntc-secretariat. Then all the data and secondry data derived from them must be deleted. One copy of its proof must be downloaded, filled out and sent by postal mail or courier to the Address below.-->`Cancellation of the Licensing of Data and Deletion of Data'

Address
    NTCIR project (Rm.1309)
    National Institute of Informatics
    2-1-2 Hitotsubashi Chiyoda-ku, Tokyo
    101-8430, JAPAN

PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat


Mailing List
The release of the new test collections and correction information shall be announced through the ntcir Mailing list: ntcir
To subscribe the NTCIR Mailing list, please refer this page:
http://research.nii.ac.jp/ntcir/ml-en.html.

(About ongoing NTCIR, please refer this page:
http://research.nii.ac.jp/ntcir/workshop/index.html)

All kind of the data for Research Purpose Use

NTCIR-1| NTCIR-2| NTCIR-3| NTCIR-4| NTCIR-5| NTCIR-6| NTCIR-7| NTCIR-8| NTCIR-9|


o NTCIR-1 [Detailed Table of Test Collections]

NTCIR-1(IR and Term Extraction/Role Analysis Test Collections)

o NTCIR-2 [Detailed Table of Test Collections]

NTCIR-2 (IR Test Collection)
NTCIR-2 SUMM TAO (Text Summarization)

o NTCIR-3 [Detailed Table of Test Collections]

NTCIR-3 CLIR: IR/CLIR Test Collection
NTCIR-3 PATENT (IR Test Collection)
NTCIR-3 QA: Question Answering Test Collection
NTCIR-3 SUMM: (Text Summarization Test Collection)
NTCIR-3 WEB (Web Retrieval Test Collection)

o NTCIR-4 [Detailed Table of Test Collections]

NTCIR-4 CLIR: Cross-Lingual Information Retrieval test collection
NTCIR-4 PATENT Retrieval test collection
NTCIR-4 QAC: Question Answering test collection
NTCIR-4 WEB Retrieval Test Collection

o NTCIR-5 [Detailed Table of Test Collections]

NTCIR-5 CLIR: Cross-Lingual Information Retrieval test collection
NTCIR-5 CLQA: Cross-Lingual Question Answering test collection
NTCIR-5 PATENT Retrieval test collection
NTCIR-5 QAC: Question Answering test collection
NTCIR-5 WEB Retrieval Test collection

o NTCIR-6 [Detailed Table of Test Collections]

NTCIR-6 CLIR: Cross-Lingual Information Retrieval test collection
NTCIR-6 CLQA: Cross-Lingual Question Answering test collection
NTCIR-6 OPINION Analysis test collection
NTCIR-6 PATENT Retrieval test collection
NTCIR-6 QAC: Question Answering test collection
NTCIR-6 MuST: Multimodal Summarization for Trend Information test collection

o NTCIR-7 [Detailed Table of Test Collections]

NTCIR-7 ACLIA: Advanced Cross-Lingual Information Retrieval and Question Answering Test Collection
NTCIR-7 MOAT: Multilingual Opinion Analysis Task test collection
NTCIR-7 PATMN: Patent Mining test collection
NTCIR-7 PATMT: Patent Translation test collection
NTCIR-7 MuST: Multimodal Summarization for Trend Information test collection

o NTCIR-8 [Detailed Table of Test Collections]

NTCIR-8 ACLIA: Advanced Cross-Lingual Information Retrieval and Question Answering Test Collection
NTCIR-8 GeoTime: Geo-temporal Information Retrieval test collection
NTCIR-8 MOAT: Multilingual Opinion Analysis Task test collection
NTCIR-8 PATMN: Patent Mining test collection
NTCIR-8 PATMT test collection (Translation Subtask/Evaluation Subtask)
NTCIR-8 CQA: Community QA test collection

o NTCIR-9 [Detailed Table of Test Collections]

NTCIR-9 CrossLink: Cross-lingual Link Discovery test collection
NTCIR-9 GeoTime: Geotemporal Information Retrieval test collection
NTCIR-9 INTENT: Intent test collection
NTCIR-9 1CLICK: One Click Access test collection
NTCIR-9 PatentMT: Patent Machine Translation test collection
NTCIR-9 RITE: Recognizing Inference in TExt test collection
NTCIR-9 SpokenDoc: IR for Spoken Documents test collection


o NTCIR-1
  • oNTCIR-1(IR and Term Extraction/Role Analysis Test Collections)
    • [Detailed Table of Test Collections]
      • The IR Test collection includes (1) Document data (Author abstracts of the Academic Conference Paper Database (1988-1997) = author abstracts of the paper presented at the academic conference hosted by either of 65 academic societies in Japan. about 330,000 documents; more than half are English-Japanese paired,) (2) 83 Search topics (Japanese,) and (3) Relevance Judgements. The collection can be used for retrieval experiments of Japanese text retrieval and CLIR of search Either of English documents or Japanese-English documents by Japanese topics. The Term Extraction Test collection includes tagged corpus using the 2000 Japanese documents selected from the above IR test collection. The whole test collection is available for research purpose use from NII.
        • Application Form [txt]
        • User agreement form (sent by email)
        • Readme for the CD-ROM [txt]
o NTCIR-2
  • o NTCIR-2 (IR Test Collection)
    • [Detailed Table of Test Collections]
      • The collection includes (1) Document data (Author abstracts of the Academic Conference Paper Database (1997-1999) and Grant Reports (1988-1997) = about 400,000 Japanese and 130,000 English documents,) (2) 49 Search topics (Japanese and English,) and (3) Relevance Judgements. The whole test collection is available for research purpose use from NII For experiments, the document data must be used with those of the NTCIR-1. Relevance judgments were done of the merged database of NTCIR-1 and NTCIR-2. To merge document collections, the document IDs in the NTCIR-1 must be converted using the script included in the NTCIR-2 CD-ROM. At the Second NTCIR Workshop, segmented data, in which the whole document data were segmented into terms (short units as well as longer units) using the standard software for segmentation in the year of 2000. Those who are interested in the segmented data, please contact ntc-secretariat@nii.ac.jp.
        • Application Form [txt]
        • User agreement form (sent by email)
        • Readme for the CD-ROM [txt]
        • NOTE: To display and print the English manual (in PDF form) of NTCIR-2 CD-ROM, you need to download and install Acrobat Reader 4.0 Asian Font Pack (Japanese) from
          http://www.adobe.com/products/acrobat/cjkfontpack.html.

  • o NTCIR-2 SUMM (Text Summarization Test Collection)
    • [Detailed Table of Test Collections]
      • The collection includes (1) Document data (Japanese newspaper articles Mainichi Newspaper (1994, 1995, 1998,) and (2) Model Summaries (for each of 180 documents, 7 types of single document summaries prepared in different length by different strategies were prepared by 3 analysts.) The Summaries are available from NII. The document data is available from Mainichi Newspaper Co..
      • Topics and Relevance judgments
        • Application Form [txt]
        • Formal Application [PDF]
        • Readme for the data set [txt]

  • o NTCIR-2 SUMM TAO (Text Summarization)
o NTCIR-3
  • o NTCIR-3 CLIR: IR/CLIR Test Collection
    • [Detailed Table of Test Collections]
      • The collection includes (1) Document data (Mainichi Newspaper 1998-1999 (Japanese,) CIRB011+CIRB020 (Chinese News articles publish in Taiwan in 1998-1999,) Mainichi Daily 1998-1999 (English Newspaper published in Japan,) EIRB010 (English News articles published in Taiwan in 1998-1999, and Korean Economic Daily 1994 (Korean Newspaper,) (2) 50 Search topics for 1998-1999 Collections and 30 topics for 1994 collections (Chinese, Korean, Japanese and English,) and (3) Relevance Judgements. The Topics and Relevance Judgments, Mainichi Daily (English,) CIRB020 (Chinese) are available from NII. The document data is re-used in NTCIR-4 CLIR as well. The Japanese document data is available from Mainichi Newspaper Co. Other document data is available NTCIR Workshop participants only. Please notice that topics and relevance judgments usable for retrieval experiments vary according to the document data set to be retrieved. For details, please consult README.

      • Topics and Relevance Judgments are downloadable from NII/IDR.
      • If you will obtain the Test Collection (Document Data and Topics/Relevance Judgments)
        • Application Form [txt]
        • User agreement form (sent by email)
        • The terms of use [PDF]
        • README for NTCIR-3 CLIR [dry run] [fomal run]

  • o NTCIR-3 PATENT (IR Test Collection)
    • [Detailed Table of Test Collections]
      • The collection includes (1) Document data (Japanese Patent Application fulltext 1998-1999 JAPIO Japanese abstracts (1995-1999) and PAJ English Abstract (1995-1999,) (2) 30 Search topics (Japanese and translation to Traditional Chinese, Simplified Chinese, Korean and English,) and (3) Relevance Judgements. JAPIO Abstract and PAJ Abstracts are exactly translated pairs. Document sizes are 18GB for fulltext and 4GB for Abstracts. NTCIR-4 PATENT used Patent Application fulltext 1993-2002 and PAJ 1993-2002 but it includes small number of inconsistent document data. Each topic for NTCIR-3 PATENT includes a related newspaper article, and the collection is usable for Cross-Genre experiments in which patents were retrieved by a newspaper clip as well as ordinary ad hoc retrieval of patents by topics. For CLIR experiments, using JAPIO abstracts and PAJ abstracts of 1995-1997 only to extract translation knowledge is strongly recommended. The whole test collection is available for research purpose use from NII.
      • retrieval task test collection
        • Application Form [txt]
        • User agreement form (sent by email)
        • README for NTCIR-3 PATENT [PDF]
  • oQAC : NTCIR-3 QA: Question Answering Test Collection
    • [Detailed Table of Test Collections]
      • The collection includes (1) Document data (Mainichi Newspaper 1998-1999 (Japanese),) (2) about 1200 questions (Japanese and English translation,) and (3) Answers. The Questions and Answers are available from NII. The document data is re-used in NTCIR-4 QAC as well. The document data is available from Mainichi Newspaper Co.
      • Questions and Answers Data are downloadable from NII/IDR.
        • The terms of use [PDF]
        • Notices : [PDF]
        • README for NTCIR-3 QA [txt]
  • oTSC : NTCIR-3 SUMM: (Text Summarization Test Collection)
    • [Detailed Table of Test Collections]
      • The collection includes (1) Document data (Japanese newspaper articles Mainichi Newspaper (1998-1999,) and (2) Model Summaries. Summary data consists of (2i) Single document summaries (Each of 60 documents, 7 types of single document summaries prepared in different length by different strategies were prepared by 3 analysts) and (2ii) Multi-document summaries (Each of 50 document collections, 2 types of length of summaries were prepared by 3 analysis. The topics of the document collections were given.) The Summaries are available from NII. The document data is available from Mainichi Newspaper Co.
      • Summaries
  • oNTCIR-3 WEB (Web Retrieval Test Collection)
    • [Detailed Table of Test Collections ]
      • The collection includes (1) Document data (html and plain-text files mainly crawled from ".jp" domain. Most of them are written in Japanese or English, but some are in other languages. The size is 100GB,) (2) 47 search topics (Japanese and English translation,) and (3) Relevance judgements (The "One-click distance model" or "Page-unit document model" are used in relevance judgements.) The whole test collection is available for research purpose use from the National Institute of Informatics (NII.) The separate application for the document data ("NW100G-01") and for the topics and relevance judgments are needed.
      • (The former restriction, under which the users were permitted to access and process the Document data only in the "Open Laboratory," has been abolished.)

      • NTCIR-3 Web Retrieval Test Collection are available from NII/IDR.

      • Document data ("NW100G-01") Topics and Relevance judgments
        • The terms of use [PDF]
        • README for NTCIR-3 WEB Topics and Relevance judgments data of main tasks [txt] and of Speech-Driven Retrieval Sub-task [txt]

o NTCIR-4 [Detailed Table of Test Collections]

oNTCIR-5 [Detailed Table of Test Collections]

    If you need more information, please visit our website.
    NTCIR-5 CLIR test collection
    NTCIR-5 CLQA test collection
    NTCIR-5 PATENT test collection
    NTCIR-5 QAC test collection
    NTCIR-5 WEB Test collection


oNTCIR-6 [Detailed Table of Test Collections]

    If you need more information, please visit our website.
    NTCIR-6 CLIR test collection
    NTCIR-6 CLQA test collection
    NTCIR-6 OPINION test collection
    NTCIR-6 PATENT test collection
    NTCIR-6 QAC test collection
    NTCIR-6 MuST test collection


oNTCIR-7 [Detailed Table of Test Collections]

    If you need more information, please visit our website.
    NTCIR-7 ACLIA test collection
    NTCIR-7 MOAT test collection
    NTCIR-7 PATMN test collection
    NTCIR-7 PATMT test collection
    NTCIR-7 MuST test collection


oNTCIR-8 [Detailed Table of Test Collections]

   If you need more information, please visit our website.
    NTCIR-8 ACLIA test Collection
    NTCIR-8 GeoTime test collection
    NTCIR-8 MOAT test collection
    NTCIR-8 PATMN test collection
    NTCIR-8 PATMT test collection (Translation Subtask/Evaluation Subtask)
    NTCIR-8 CQA test collection


oNTCIR-9 [Detailed Table of Test Collections]

   If you need more information, please visit our website.
     NTCIR-9 CrossLink: Cross-lingual Link Discovery test collection
    NTCIR-9 GeoTime: Geotemporal Information Retrieval test collection
    NTCIR-9 INTENT: Intent test collection
    NTCIR-9 1CLICK: One Click Access test collection
    NTCIR-9 PatentMT: Patent Machine Translation test collection
    NTCIR-9 RITE: Recognizing Inference in TExt test collection
    NTCIR-9 SpokenDoc: IR for Spoken Documents test collection



[Japanese] [NTCIR Home]
Updated: 2015-12-07
ntc-admin