Task Overview and Call for Task Participation

The 16th NTCIR (2021 - 2022)
Evaluation of Information Access Technologies
Conference: June 2022, NII, Tokyo, Japan

Call for Participation to the NTCIR-16 Tasks:

Let's participate in a collaborative activity for enhancing Information Access technologies!

For the 20 years, NTCIR has been formulating the infrastructure for the evaluation, and contributing to development of the Information Access technologies. Consequently, NTCIR has been the major forum for researchers to intensively discuss the evaluation methodology of emerging information access technologies.
The 16th NTCIR, NTCIR-16, now calls for task participation of anyone interested in research on information access technologies and their evaluation, such as retrieval from a large amount of document collections, question answering and natural language processing. We welcome students, young researchers, professors who supervise students, researchers working for a company, and anyone who is interested in informatics.

*"Online presentation" will be available at the NTCIR-16 Conference.

NTCIR Aims

Evaluation Tasks

The sixteenth NTCIR (NTCIR-16) Program Committee has selected the following six Core Tasks and four Pilot Tasks.
For details and latest information, please see below and visit each task’s homepage.

Data Search     DialEval-2     FinNum-3     Lifelog-4     QA Lab-PoliInfo-3     WWW-4    
RCIR     Real Med-NLP     SS     ULTRE    

CORE TASKS

Data Search 2 ("Data Search 2")

"Search and question answering for statistical data"

Abstract:
The Data Search 2 task focuses on the retrieval of a statistical data collection published by the Japanese government (e-Stat), and one published by the US government (Data.gov). In addition, we organize question answering and search interface subtasks in this round.

Website: https://ntcir.datasearch.jp/
Contact:

PAGE TOP


Dialogue Evaluation 2 ("DialEval-2")

"Estimating the quality of a customer-helpdesk dialogue; understanding the role of each dialogue turn towards problem solving"

Abstract:
This task is a sequel to the NTCIR-14 Short Text Conversation DQ (Dialogue Quality) and Nugget Detection (ND) subtasks and the NTCIR-15 DialEval-1 task. The DQ subtask requires the participating systems to estimate the distribution of dialogue quality scores for a given dialogue. The ND subtask requires them to estimate the distribution of gold labels over nugget types. A nugget is an utterance by a helpdesk or a customer that helps the customer transition from the initial problem-facing state to the problem-solved state.

Website: TBA
Contact:

PAGE TOP


Investor’s and Manager’s Fine-grained Claim Detection ("FinNum-3")

"Investor’s and Manager’s Fine-grained Claim Detection"

Abstract:
In FinNum-1 and FinNum-2, we focus on understanding the numerals in financial social media data. The task of understanding the meaning of numeral (FinNum-1) and the numeral attachment issue (FinNum-2) are explored. In FinNum-3, we pay attention to formal documents (professional analyst's report and earnings conference call), and propose multilingual datasets (Chinese and English) for participants to explore on new numeral-related task, named fine-grained claim detection.

Website: https://sites.google.com/nlg.csie.ntu.edu.tw/finnum3/
Contact:

PAGE TOP


Lifelog Access and Retrieval ("Lifelog-4")

"Evalutating approaches to information retrieval from multimodal lifelogs"

Abstract:
This core task aims to advance the state-of-the-art research in lifelogging as an application of information retrieval. The Lifelog Semantic Access Task (LSAT) is a known-item search task that can be undertaken in an interactive or automatic manner.

Website: http://ntcir-lifelog.computing.dcu.ie/
Contact:

PAGE TOP


Question Answering Lab for Political Information ("QA Lab-PoliInfo-3")

"QA Lab-PoliInfo-3 aims to solve four tasks Question Answering, QA Alignment, Fact Verification and Budget Argument Mining) in political issues."

Abstract:
QA Lab-PoliInfo-3 aims to solve four tasks Question Answering, QA Alignment, Fact Verification and Budget Argument Mining) in political issues. In the question answering subtask, we aim to generate a brief answer to a given question. In the QA Alignment subtask, we automatically align each question with its appropriate answer in the assembly minutes. In the Fact Verification subtask, we focus on (1) verifying whether a given speech summary that may be fake is true, and (2) finding utterances corresponding to the summary in assembly minutes if it is true. In the Budget Argument Mining subtask, we aims to connect a budget item and the related discussion.

Website: https://poliinfo3.net
Contact:

PAGE TOP


We Want Web 4 with CENTRE ("WWW-4")

"Quantifying advances and reproducibility in web search"

Abstract:
This is an adhoc English web search task that tries to monitor technological advances in web search and to study replicability/reproducibility issues in IR evaluation. CENTRE stands for (CLEF/NTCIR/TREC Reproducibility): this was also a track/task at TREC 2018, CLEF 2018 and 2019, as well as NTCIR-14. CENTRE became part of the WWW-task at NTCIR-15.

Website: TBA
Contact:

PAGE TOP



PILOT TASKS

Reading Comprehension for Information Retrieval ("RCIR")

"Text reading signals in Information Retrieval"

Abstract:
The NTCIR-16 RCIR pilot task aims to motivate the development of a first generation of personalised retrieval techniques that integrate reading comprehension measures from biosignals as a source of evidence when ranking text content. Participating researchers will develop and benchmark approaches to integrate multi-modal signals (e.g. eye tracking, EOG, screenshots, etc) into the retrieval process for two sub-tasks, a comprehension-evaluation task (CET) that aims to sort texts in terms of comprehension levels, and a comprehension-based retrieval task (CRT) that aims to rank texts (for a variety of topics) by integrating comprehension-evidence into the IR process. Both sub-tasks are exploratory in nature, but designed to facilitate initial experimentation on the topic by the community. A new dataset will be generated by the organisers and will consist of textual data extracted from Wikipedia (the content texts) as well as a range of preprocessed biometric signal data. Runs will be ranked in terms of appropriate evaluation measures.

Website: http://ntcir-rcir.computing.dcu.ie/
Contact:

PAGE TOP


Real document-based Medical Natural Language Processing ("Real-MedNLP")

"Medical natural language processing using real medical documents"

Abstract:
Recently, more and more medical records are written in electronic format in place of paper, which leads to a higher importance of information processing techniques in medical fields. However, the amount of privacy-free medical text data is still small in non-English languages, such as Japanese and Chinese. In such a situation, we had proposed a series of previous four medical natural language processing (MedNLP) tasks, MedNLP-1, MedNLP-2, MedNLPDoc, and MedWeb. However, we did not utilize the real data and relied only on dummy data. Specifically, dummy medical reports created by medical doctors were used for MedNLP-1 and MedNLP-2; textual medical records from a medical textbook were used for MedNLPDoc; dummy Twitter data were created and used for MedWeb. In this proposed pilot task, we re-design the scheme, which holds the following two core resources for medical AI tasks; (1) Case-Report dataset and (2) Radiographic-Report dataset. More importantly, we prepare the real data in Japanese and translate the original reports into English, enabling us to develop the first benchmark for multi-language medical NLP. This task will yield promising technologies to develop practical computational systems for supporting a wide range of medical services.

Website: https://sociocom.naist.jp/real-mednlp/
Contact:

PAGE TOP


Session Search ("SS")

"Session search tasks based on practical data"

Abstract:
We propose this new task to support intensive investigations of session search or task- oriented search, namely NTCIR-16 Session Search (SS) task. Relevant tasks such as TREC Session Tracks and Dynamic Domain (DD) Tracks have terminated for years. However, how to optimize and further evaluate whole-session system performance is still challenging these days. As Session Tracks and DD tracks have their limitations, we project new settings that support (1) large-scale practical session datasets for model training, (2) both ad-hoc and session-level evaluation. We believe that the new task will facilitate the development of IR community in the related domain.

Website: TBA
Contact:

PAGE TOP


Unbiased Learning to Ranking Evaluation Task ("ULTRE")

"Evaluating unbiased learning-to-rank with user simulation"

Abstract:
​Unbiased learning to rank (ULTR) with biased user behavior data has received considerable attention in the IR community. However, how to properly evaluate and compare different ULTR approaches has not been systematically investigated and there is no shared task or benchmark that is specifically developed for ULTR. In this paper, we propose Unbiased Learning to Ranking Evaluation Task (ULTRE) as a pilot task in NTCIR 16. In ULTRE, we plan to design a user-simulation based evaluation protocol and implement an online benchmarking service for the training and evaluation of both offline and online ULTR models. We will also investigate questions of ULTR evaluation, particularly whether and how different user simulation models affect the evaluation results.

Website: TBA
Contact:

PAGE TOP



Last Modified: 2021-03-15