Task Overview and Call for Task Participation

The 18th NTCIR (2024 - 2025)
Evaluation of Information Access Technologies
Conference: June 10-13, 2025, NII, Tokyo, Japan

Call for Participation to the NTCIR-18 Tasks:

How to Participate / Online Registration

Let's participate in a collaborative activity for enhancing Information Access technologies!

For the 20 years, NTCIR has been formulating the infrastructure for the evaluation, and contributing to development of the Information Access technologies. Consequently, NTCIR has been the major forum for researchers to intensively discuss the evaluation methodology of emerging information access technologies.
The 18th NTCIR, NTCIR-18, now calls for task participation of anyone interested in research on information access technologies and their evaluation, such as retrieval from a large amount of document collections, question answering and natural language processing. We welcome students, young researchers, professors who supervise students, researchers working for a company, and anyone who is interested in informatics.

Please visit: http://research.nii.ac.jp/ntcir/ntcir-18/howto.html

*"Online presentation" will be available at the NTCIR-18 Conference.

Evaluation Tasks

The eighteenth NTCIR (NTCIR-18) Program Committee has selected the following three Core Tasks and three Pilot Tasks.
For details and latest information, please see below and visit each task’s homepage.

AEOLLM     FairWeb-2     FinArg-2     Lifelog-6     MedNLP-CHAT    
RadNLP     Transfer-2     HIDDEN-RAD     SUSHI     U4    

CORE TASKS

Automatic Evaluation of LLMs ("AEOLLM")

"AEOLLM concentrates on generative tasks and encourages participants to develop reference-free evaluation methods"

Abstract:
As LLMs grow popular in both academia and industry, how to effectively evaluate the capacity of LLMs becomes an increasingly critical but still challenging issue. Existing methods can be divided into two types: manual evaluation, which is expensive, and automatic evaluation, which faces many limitations including the task format (the majority belong to multiple-choice questions) and evaluation criteria (occupied by reference-based metrics). To advance the innovation of automatic evaluation, we proposed the Automatic Evaluation of LLMs (AEOLLM) task which focuses on generative tasks and encourages reference-free methods. Besides, we set up diverse subtasks such as summary generation, non-factoid question answering, text expansion, and dialogue generation to comprehensively test different methods.

Website: https://aeollm.github.io
Kickoff slide[English]

Contact:

PAGE TOP


The Second Fair Web Task ("FairWeb-2")

"Return a group-fair and relevant SERP (or textual response) for a given search topic about researchers, movies, or youtube clips!"

Abstract:
Web search: given a researcher/movie/youtube topic, return a SERP (Search Engine Result Page) that is both relevant and group-fair. Conversational search: instead of a SERP, return a textual response.

Website: http://sakailab.com/fairweb2/
Kickoff slide[English]

Contact:

PAGE TOP


Temporal Inference of Financial Arguments ("FinArg-2")

"FinArg-2 focuses on the assessment of temporal information, which is a distinct phenomenon in financial opinions."

Abstract:
In FinArg-1, we explored three types of financial documents and proposed tasks that combine argument mining and sentiment analysis. In FinArg-2, we aim to introduce "Temporal Inference of Financial Arguments," focusing on the assessment of temporal information, which is a distinct phenomenon in financial opinions. In FinArg-2, we will continue utilizing the same resources as in FinArg-1, including analyst reports, earnings conference calls, and social media data. Furthermore, all annotations will be on the same documents, enabling participants to leverage features from FinArg-1 to enhance their performance.

Website: https://sites.google.com/nlg.csie.ntu.edu.tw/ntcir-18-finarg-2/finarg-2
Kickoff slide[English]

Contact:

PAGE TOP


Personal Lifelog Organisation & Retrieval Task ("Lifelog-6")

"Lifelog task aims to advance the state of the art in multimodal lifelog organisation, search and access"

Abstract:
The lifelog task is a continuation of the tasks of pervious years with aim to advance the state-of-the-art in multimodal lifelog retrieval and analytics. We have released the first lifelog datasets with this task in the past and we have attracted over 100 participating teams to address these challenges since 2015. The current lifelog task aims to improve community knowledge and expertise in asynchronous lifelog retrieval from multi-year archives, Q&A from lifelogs and novel lifelog analytics. We expect to attract a wide range of interested participants.

Website: http://lifelogsearch.org/ntcir-lifelog/

Contact:

PAGE TOP


Medical Natural Language Processing for AI Chat ("MedNLP-CHAT")

"MedNLP-CHAT evaluates medical chatbots based on multiple viewpoints."

Abstract:
Medical chatbot service is promising solution for medical/healthcare human resource problem. But, the risk of chatbot is not well known: We created the testbed of the potential chatbot responses from various aspects: medical validation, legal viewpoints, ethical issue, etc.

Website: https://sociocom.naist.jp/mednlp-chat/
Kickoff slide[English]

Contact:

PAGE TOP


Natural Language Processing for Radiology ("RadNLP")

"RadNLP focuses on automated staging of lung cancer from radiology reports."

Abstract:
Lung cancer has different optimal treatments depending on its stage, or the degree of progression. However, much information regarding the stage is contained in unstructured free-text radiology reports, making it burdensome for human to make decisions. In this task, we explore the potential of NLP to aid the workflow by automatically determining the stage of lung cancer. We extend the dataset from a monolingual one (NTCIR-17) to a bilingual one (NTCIR-18).

Website: https://sociocom.naist.jp/radnlp-2024/
Kickoff slide[English]

Contact:

PAGE TOP


Resource Transfer Based Dense Retrieval ("Transfer-2")

"Transfer aims to develop a suite of technology to transfer resources that were generated for one purpose to another in the context of dense retrieval."

Abstract:
The Resource Transfer Based Dense Retrieval (Transfer) task aims to bring together researchers from Information Retrieval, Machine Learning, and Natural Language Processing to develop a suite of technology for transferring resources generated for one purpose to another in the context of dense retrieval on Japanese texts. NTCIR-18 Transfer task is currently considering to provide three subtasks: Dense Cross-Language Retrieval (DCLR), Dense Multimodal Retrieval (DMR), and Retrieval Augmented Generation (RAG).

Website: https://github.com/ntcirtransfer/transfer2/discussions
Kickoff slide[English]

Contact:

PAGE TOP


PILOT TASKS

Hidden Causality Inclusion in Radiology Report Generation ("HIDDEN-RAD")

TBA

Website: https://sites.google.com/view/ntcir-18-hidden-rad/hidden-rad
Kickoff slide[English]

Contact:

PAGE TOP


Searching Unseen Sources for Historical Information ("SUSHI")

"SUSHI pilot task explores retrieval methods for undigitized documents maintained in archival repositories."

Abstract:
The Searching Unseen Sources for Historical Information (SUSHI) pilot task aims to develop search methods for documents that are not digitized by providing testbed. The SUSHI pilot task is welcome for both researchers interested in technologies (e.g. Information Retrieval or Machine Learning) and practitioner (e.g. Librarians or archivists), so that we can explore needs for a search system for undigitized documents, and evaluation ways of such systems.

Website: https://sites.google.com/view/ntcir-sushi-task/
Kickoff slide[English]

Contact:

PAGE TOP


Unifying, Understanding, and Utilizing Unstructured Data in Financial Reports ("U4")

"The U4 task is designed to develop techniques for extracting structured information from tabular data and documents, with a special focus on annual securities reports."

Abstract:
The U4 task is designed to develop techniques for extracting structured information from tabular data and documents, with a special focus on annual securities reports. We will provide a dataset based on ASRs for training and testing, and collaboratively investigate appropriate evaluation metrics and methodologies with participants for information extraction from tabular data and documents. Our plan includes the following subtasks: Table Retrieval and Table Question Answering (Table QA). The Table Retrieval subtask aims to identify suitable tables from the ASRs, while the Table QA subtask is focused on providing precise answers from tables to user's questions.

Website: https://sites.google.com/view/ntcir18-u4/
Kickoff slide[English]

Contact:

PAGE TOP



Last Modified: 2024-05-02