NTCIR-12 Temporalia

NTCIR Project
NTCIR-12 Temporalia
Research Purpose Use of Test Collection

[NTCIR Home] [NTCIR Data Home] [NTCIR Data Home (Japanese)]

NTCIR-12 Temporalia (foster research in temporal information access)

The NTCIR Temporal Information Access (Temporalia) Task is to foster research in temporal information access.
Given the fact that time plays a crucial role in estimating information relevance and validity we believe that successful search engines must consider temporal aspects of information in greater detail.
At NTCIR-12, we build test collections for Temporal Intent Disambiguation (TID) Subtask and Temporally Diversified Retrieval (TDR) Subtask.

TID: TID subtask asks participants to estimate a distribution of four temporal intent classes (Atemporal, Past, Recent, or Future) for a given query.
TDR: TDR subtask requires participants to retrieve a set of documents relevant to each of four temporal intent classes for a given topic description.

Collection	Document Data				Task Data
	Corpus	Language	Volume	Year	Subtask	Type	Language	Volume	Year	File name
NTCIR-12 Temporalia (English)					TID	Query	English	300 queries	2015	NTCIR12_Temporalia2_FormalRun_TID_En
					TDR	Document	English	50 topics	2015	NTCIR12_Temporalia2_FormalRun_TDR_En
	Living Knowledge Corpus 2011-2013 *	English	ca. 3.8M docs (ca. 20GB)	Collected during May 2011 and Feb 2013
NTCIR-12 Temporalia (Chinese)					TID	Query	Chinese	300 queries	2015	NTCIR12_Temporalia2_FormalRun_TID_Ch

*: Document Data is not distributed from NII. Users are required to obtain it by themselves.
- Please see here.

Collection Description

Please read http://ntcirtemporalia.github.io/NTCIR-12/collection.html

TID

For each query the following information is available

<query>
<id>028</id>
<query_string>quote of the day</query_string>
<query_issue_time>May 1, 2013 GMT+0</query_issue_time>
<probabilities>
<Past>0.0</Past>
<Recency>0.8</Recency>
<Future>0.0</Future>
<Atemporal>0.2</Atemporal>
</probabilities>
</query>

The "id" refers to a unique query identifier in the collection.
The "query_string" is the query and the "query_issue_time" is the date when the query was submitted.
The "probabilities" contains a probability of search intents for four temporal classes.

TDR

For each topic the following

<topic>

<id>001</id>

<title>Earthquakes</title>

<description>I suspect that these days the intensity of harsh weather conditions such as earthquakes is increased when compared to the past. In order to make sure I need to collect information on earthquake, their past occurrences, and future forecasts, etc..</description>

<query_issue_time>Mar 29, 2013 GMT+0:00</query_issue_time>

<subtopics>

<subtopic id="001a" type="atemporal">What is an earthquake and how severe it can be?</subtopic>

<subtopic id="001p" type="past">What past earthquakes were most deadly?</subtopic>

<subtopic id="001r" type="recency">What was the latest earthquake in Asia?</subtopic>

<subtopic id="001f" type="future">What are predictions regarding the occurrence of earthquakes in the near future?</subtopic>

</subtopics>

</topic>

The "id" refers to a unique query identifier in the collection.
The "title" is brief description of search intents.
The "description" is background and motivation of search.
The "query_issue_time" is the date when the search was conducted.
The "subtopic" is specific question to answer by searching.
The attribute of "type" and "id" in "subtopic" tags are the temporal class and identifier in each question.

relevance judgement

For relevance judgement of each document the following

r Document ID Rel
001a lk-20110830040101_530 L0
001a lk-20111001040102_2640 L1
001a lk-20111005040101_3110 L2

The "r" column is identifier of each subtopic. The last letter represents a temporal class (a=atemporal, f=future, r=recency, p=past)
The "Document ID" column is document identifier in Living Knowledge Corpus.
The "Rel" column is the relevance assessment value for the topic (L0=Not Relevant, L1=Relevant, L2=Highly Relevant)

The following is the procedure to obtain the test collection. The test collection and data are available from NII free of charge.

NTCIR-12 Temporalia query data, topic data, and relevance judgement are downloadable from IDR/NII at:
http://www.nii.ac.jp/dsc/idr/en/ntcir/ntcir.html

Document Data is not distributed from NII. Users are required to obtain it by themselves.
Please read http://ntcirtemporalia.github.io/NTCIR-12/collection.html.

Reference

The terms of use [PDF]
Overview of NTCIR-12 Temporal Information Access (Temporalia-2) Task

NTCIR-12 Temporalia Task website

NTCIR-12 Temporalia Task Evaluation Results

NTCIR-12 Conference Proceedings: Temporalia

Tools

Contact us: ntc-secretariat

Notice

The test collection was constructed and used for the NTCIR project. It is usable only for research purposes.
The document collection included in the test collection was made available to NII for use in the NTCIR project free of charge or for a fee. The providers of the document data understand the importance of such test collections in research on information access technologies and have kindly given their permission to use the data for research purposes. Please remember that the document data in the NTCIR test collection is copyrighted and has commercial value as data. To maintain a good relationship with the data producers/provider, we researchers must be reliable partners and use the data only for research purposes under the user agreement, and we must use the data carefully so as not to violate copyright.

[Top of this page]

[NTCIR Home] [NTCIR Data Home] [NTCIR Data Home (Japanese)]

Updated on : 2016-09-08

ntc-admin

NTCIR Project NTCIR-12 Temporalia Research Purpose Use of Test Collection