NTCIR Project
Research Purpose Use of Test Collection




The NTCIR-10 INTENT (INTENT-2) Test Collections comprises:

(a) NTCIR-10 INTENT-2 Chinese Subtopic Mining Test Collection
(b) NTCIR-10 INTENT-2 Japanese Subtopic Mining Test Collection
(c) NTCIR-10 INTENT-2 English Subtopic Mining Test Collection
(d) NTCIR-10 INTENT-2 Chinese Document Ranking Test Collection
(e) NTCIR-10 INTENT-2 Japanese Document Ranking Test Collection.

For evaluating Document Ranking, document collections need to be obtained separately:

(SogouT for Chinese and ClueWeb09-JA for Japanese) need to be obtained separately (not from NII).
For Subtopic Mining, the document collections are not a requirement.
(Version: 2012
--This document collection is available from the Tsinghua-Sohu Joint Laboratory on Search Technology. The collection contains about 130 million Chinese Web pages together with the corresponding link graph. The size is roughly 5TB uncompressed. The data was crawled and released on 2012.

Further information regarding this collection can be found on the page: http://www.sogou.com/labs/dl/t-e.html
You can also directly contact chenjing to obtain the data set.
ClueWeb09-JA --This document collection is available from the Language Technologies Institute at Carnegie Mellon University. The ClueWeb09-JA collection is composed of all the 67M Japanese pages in the ClueWeb09 collection. We appreciate Prof. Jamie Callan and his team providing the ClueWeb09-JA collection, which dramatically reduces the cost of participants. The data was crawled during January and February 2009.

Further information regarding the collections can be found on the page: http://boston.lti.cs.cmu.edu/Data/clueweb09/

INTENT2ResearchPurposeData.gz contains
the topics, intents, relevance assessments, and the official query suggestion data for (a)-(e).

For computing evaluation metrics such as intent recall and D-measures, the NTCIREVAL toolkit can be used.

Form more details, please refer to the README file and
the NTCIR-10 INTENT-2 overview paper available at the
NTCIR-10 online proceedings:


The test collection and data are available from NII free of charge.


Contact us : ntc-secretariat


The test collection was constructed and used for the NTCIR project. It is usable only for research purposes.
The document collection included in the test collection was made available to NII for use in the NTCIR project free of charge or for a fee. The providers of the document data understand the importance of such test collections in research on information access technologies and have kindly given their permission to use the data for research purposes. Please remember that the document data in the NTCIR test collection is copyrighted and has commercial value as data. To maintain a good relationship with the data producers/provider, we researchers must be reliable partners and use the data only for research purposes under the user agreement, and we must use the data carefully so as not to violate copyright.