[NTCIR Home] [NTCIR Data Home] [NTCIR Data Home (Japanese)]
The NTCIR Temporal Information Access (Temporalia) Task is to foster research in temporal information access.
Given the fact that time plays a crucial role in estimating information
relevance and validity we believe that successful search engines must consider
temporal aspects of information in greater detail.
At NTCIR-12, we build test collections for Temporal Intent Disambiguation (TID) Subtask and Temporally Diversified Retrieval (TDR) Subtask.
Collection | Document Data | Task Data | ||||||||
Corpus | Language | Volume | Year | Subtask | Type | Language | Volume | Year | File name | |
NTCIR-12 Temporalia (English) | TID | Query | English | 300 queries | 2015 | NTCIR12_Temporalia2_FormalRun_TID_En | ||||
TDR | Document | English | 50 topics | 2015 | NTCIR12_Temporalia2_FormalRun_TDR_En | |||||
Living Knowledge Corpus 2011-2013 * | English | ca. 3.8M docs (ca. 20GB) | Collected during May 2011 and Feb 2013 | |||||||
NTCIR-12 Temporalia (Chinese) | TID | Query | Chinese | 300 queries | 2015 | NTCIR12_Temporalia2_FormalRun_TID_Ch |
*: Document Data is not distributed from NII. Users are required to obtain
it by themselves.
- Please see here.
Please read http://ntcirtemporalia.github.io/NTCIR-12/collection.html
For each query the following information is available
The "id" refers to a unique query identifier in the collection.
The "query_string" is the query and the "query_issue_time" is the date when the query was submitted.
The "probabilities" contains a probability of search intents for four temporal classes.
For each topic the following
The "id" refers to a unique query identifier in the collection.
The "title" is brief description of search intents.
The "description" is background and motivation of search.
The "query_issue_time" is the date when the search was conducted.
The "subtopic" is specific question to answer by searching.
The attribute of "type" and "id" in "subtopic"
tags are the temporal class and identifier in each question.
For relevance judgement of each document the following
The "r" column is identifier of each subtopic. The last letter represents a temporal class (a=atemporal, f=future, r=recency, p=past)
The "Document ID" column is document identifier in Living Knowledge
Corpus.
The "Rel" column is the relevance assessment value for the topic
(L0=Not Relevant, L1=Relevant, L2=Highly Relevant)
The following is the procedure to obtain the test collection. The test collection and data are available from NII free of charge.
- NTCIR-12 Temporalia query data, topic data, and relevance judgement are downloadable from IDR/NII at:
http://www.nii.ac.jp/dsc/idr/en/ntcir/ntcir.html
- Document Data is not distributed from NII. Users are required to obtain it by themselves.
Please read http://ntcirtemporalia.github.io/NTCIR-12/collection.html.Reference
Contact us: ntc-secretariat
Notice
The test collection was constructed and used for the NTCIR project. It is usable only for research purposes.
The document collection included in the test collection was made available to NII for use in the NTCIR project free of charge or for a fee.
The providers of the document data understand the importance of such test collections in research on information access technologies and
have kindly given their permission to use the data for research purposes. Please remember that the document data in the NTCIR test collection is copyrighted and has commercial
value as data. To maintain a good relationship with the data producers/provider, we researchers must be reliable partners and use the data only for research
purposes under the user agreement, and we must use the data carefully so as not to violate copyright.
[NTCIR Home] [NTCIR Data Home] [NTCIR Data Home (Japanese)]