[JAPANESE] [NTCIR Home] [NTCIR Data Home]
The NTCIR-10 INTENT (INTENT-2) Test Collections comprises:
(a) NTCIR-10 INTENT-2 Chinese Subtopic Mining Test Collection
(b) NTCIR-10 INTENT-2 Japanese Subtopic Mining Test Collection
(c) NTCIR-10 INTENT-2 English Subtopic Mining Test Collection
(d) NTCIR-10 INTENT-2 Chinese Document Ranking Test Collection
(e) NTCIR-10 INTENT-2 Japanese Document Ranking Test Collection.
For evaluating Document Ranking, document collections need to be obtained separately:
|--This document collection is available from the Tsinghua-Sohu Joint Laboratory
on Search Technology. The collection contains about 130 million Chinese Web pages together with the corresponding link graph. The size is roughly 5TB uncompressed. The data was crawled and released on 2012.
Further information regarding this collection can be found on the page: http://www.sogou.com/labs/dl/t-e.html.
You can also directly contact chenjing to obtain the data set.
|ClueWeb09-JA||--This document collection is available from the Language Technologies
Institute at Carnegie Mellon University. The ClueWeb09-JA collection is
composed of all the 67M Japanese pages in the ClueWeb09 collection. We appreciate Prof. Jamie Callan and his
team providing the ClueWeb09-JA collection, which dramatically reduces
the cost of participants. The data was crawled during January and February 2009.
Further information regarding the collections can be found on the page: http://boston.lti.cs.cmu.edu/Data/clueweb09/
the topics, intents, relevance assessments, and the official query suggestion data for (a)-(e).
For computing evaluation metrics such as intent recall and D-measures,
the NTCIREVAL toolkit can be used.
Form more details, please refer to the README file and
the NTCIR-10 INTENT-2 overview paper available at the
NTCIR-10 online proceedings:
The test collection and data are available from NII free of charge.
- NTCIR-10 INTENT-2 Task data are downloadable from NII/IDR at:
- Overview of the NTCIR-10 INTENT Task [PDF]
- NTCIR-10 INTENT (INTENT-2) website
Contact us : ntc-secretariat
The test collection was constructed and used for the NTCIR project. It is usable only for research purposes.
The document collection included in the test collection was made available to NII for use in the NTCIR project free of charge or for a fee. The providers of the document data understand the importance of such test collections in research on information access technologies and have kindly given their permission to use the data for research purposes. Please remember that the document data in the NTCIR test collection is copyrighted and has commercial value as data. To maintain a good relationship with the data producers/provider, we researchers must be reliable partners and use the data only for research purposes under the user agreement, and we must use the data carefully so as not to violate copyright.