[JAPANESE] [NTCIR Home] [NTCIR Data Home]
The NTCIR-11 INTENT (IMine) Test Collections comprises:
(a) NTCIR-11 IMine Chinese Subtopic Mining Test Collection
(b) NTCIR-11 IMine Japanese Subtopic Mining Test Collection
(c) NTCIR-11 IMine English Subtopic Mining Test Collection
(d) NTCIR-11 IMine Chinese Document Ranking Test Collection
(e) NTCIR-11 IMine English Document Ranking Test Collection
(f) NTCIR-11 IMine Japanese Search Task Mining Test Collection
For evaluating Document Ranking, document collections need to be obtained separately:
|--This document collection is available from the Tsinghua-Sohu Joint Laboratory
on Search Technology. The collection contains about 130 million Chinese Web pages together with the corresponding link graph. The size is roughly 5TB uncompressed. The data was crawled and released on 2012.
Further information regarding this collection can be found on the page: http://www.sogou.com/labs/dl/t-e.html.
You can also directly contact chenjing to obtain the data set.
|ClueWeb12-B13||--This document collection is available from the Language Technologies
Institute at Carnegie Mellon University. The ClueWeb12-B13 collection is
composed of all the 52M English pages in the ClueWeb12 collection. We appreciate Prof. Jamie Callan and his
team providing the ClueWeb12-B13 collection, which dramatically reduces
the cost of participants. The data was crawled during Feburary and May 2012.
Further information regarding the collections can be found on the page: http://lemurproject.org/clueweb12/
Subtopic Mining Test Collection comprises of the following data:
(1) 50 topics (queries)
(2) Second-level hierarchical intents for each topic, obtained by manually clustering the subtopic strings submitted by the Subtopic Mining participants
(3) An intent probability for each intent, estimated through assessor voting
(4) Pooled subtopics that correspond to each intent
Document Ranking Test Collection comprises of the following data:
(1) 50 topics (the same as the subtopic mining subtask)
(2) Pooled and judged documents with graded relevance, from L0 (judged nonrelevant) to L4 (highly relevant).
Search Task Mining Test Collection comprises of the following data:
(1) 50 topics (queries)
(2) Gold Standard task strings with thier importance for each topic
(3) Pooled participant task strings with matching information with gold standard task strings.
The test collection and data are available from NII free of charge.
- NTCIR-11 IMine Task data are downloadable from NII/IDR at:
Contact us : ntc-secretariat
The test collection was constructed and used for the NTCIR project. It is usable only for research purposes.
The document collection included in the test collection was made available to NII for use in the NTCIR project free of charge or for a fee. The providers of the document data understand the importance of such test collections in research on information access technologies and have kindly given their permission to use the data for research purposes. Please remember that the document data in the NTCIR test collection is copyrighted and has commercial value as data. To maintain a good relationship with the data producers/provider, we researchers must be reliable partners and use the data only for research purposes under the user agreement, and we must use the data carefully so as not to violate copyright.