The 18th NTCIR (2024 - 2025)
Evaluation of Information Access Technologies
Conference: June 10-13, 2025, NII, Tokyo, Japan
Call for Participation to the NTCIR-18 Tasks:
Call for Participation [Flyer]
How to Participate / Online Registration
Let's participate in a collaborative activity for enhancing Information
Access technologies!
For the 20 years, NTCIR has been formulating the infrastructure for the evaluation, and contributing to development of the Information Access technologies. Consequently, NTCIR has been the major forum for researchers to intensively discuss the evaluation methodology of emerging information access technologies.
The 18th NTCIR, NTCIR-18, now calls for task participation of anyone
interested in research on information access technologies and their evaluation,
such as retrieval from a large amount of document collections, question
answering and natural language processing. We welcome students, young researchers,
professors who supervise students, researchers working for a company, and
anyone who is interested in informatics.
*"Online presentation" will be available at the NTCIR-18 Conference.
The eighteenth NTCIR (NTCIR-18) Program Committee has selected the
following seven Core Tasks and three Pilot Tasks.
For details and latest information, please see below and visit each task’s homepage.
AEOLLM
FairWeb-2
FinArg-2
Lifelog-6
MedNLP-CHAT
RadNLP
Transfer-2
HIDDEN-RAD
SUSHI
U4
CORE TASKS
"AEOLLM concentrates on generative tasks and encourages participants to develop reference-free evaluation methods"
Abstract:
As LLMs grow popular in both academia and industry, how to effectively evaluate the capacity of LLMs becomes an increasingly critical but still challenging issue. Existing methods can be divided into two types: manual evaluation, which is expensive, and automatic evaluation, which faces many limitations including the task format (the majority belong to multiple-choice questions) and evaluation criteria (occupied by reference-based metrics). To advance the innovation of automatic evaluation, we proposed the Automatic Evaluation of LLMs (AEOLLM) task which focuses on generative tasks and encourages reference-free methods. Besides, we set up diverse subtasks such as summary generation, non-factoid question answering, text expansion, and dialogue generation to comprehensively test different methods.
Website: https://aeollm.github.io
Kickoff slide[English]
Contact:
"Return a group-fair and relevant SERP (or textual response) for a given search topic about researchers, movies, or youtube clips!"
Abstract:
Web search: given a researcher/movie/youtube topic, return a SERP (Search Engine Result Page) that is both relevant and group-fair. Conversational search: instead of a SERP, return a textual response.
Website: http://sakailab.com/fairweb2/
Kickoff slide[English]
Contact:
"FinArg-2 focuses on the assessment of temporal information, which is a distinct phenomenon in financial opinions."
Abstract:
In FinArg-1, we explored three types of financial documents and proposed tasks that combine argument mining and sentiment analysis. In FinArg-2, we aim to introduce "Temporal Inference of Financial Arguments," focusing on the assessment of temporal information, which is a distinct phenomenon in financial opinions. In FinArg-2, we will continue utilizing the same resources as in FinArg-1, including analyst reports, earnings conference calls, and social media data. Furthermore, all annotations will be on the same documents, enabling participants to leverage features from FinArg-1 to enhance their performance.
Website: https://sites.google.com/nlg.csie.ntu.edu.tw/ntcir-18-finarg-2/finarg-2
Kickoff slide[English]
Contact:
"Lifelog task aims to advance the state of the art in multimodal lifelog organisation, search and access"
Abstract:
The lifelog task is a continuation of the tasks of pervious years with aim to advance the state-of-the-art in multimodal lifelog retrieval and analytics. We have released the first lifelog datasets with this task in the past and we have attracted over 100 participating teams to address these challenges since 2015. The current lifelog task aims to improve community knowledge and expertise in asynchronous lifelog retrieval from multi-year archives, Q&A from lifelogs and novel lifelog analytics. We expect to attract a wide range of interested participants.
Website: http://lifelogsearch.org/ntcir-lifelog/
Contact:
"MedNLP-CHAT evaluates medical chatbots based on multiple viewpoints."
Abstract:
Medical chatbot service is promising solution for medical/healthcare human resource problem. But, the risk of chatbot is not well known: We created the testbed of the potential chatbot responses from various aspects: medical validation, legal viewpoints, ethical issue, etc.
Website: https://sociocom.naist.jp/mednlp-chat/
Kickoff slide[English]
Contact:
"RadNLP focuses on automated staging of lung cancer from radiology reports."
Abstract:
Lung cancer has different optimal treatments depending on its stage, or the degree of progression. However, much information regarding the stage is contained in unstructured free-text radiology reports, making it burdensome for human to make decisions. In this task, we explore the potential of NLP to aid the workflow by automatically determining the stage of lung cancer. We extend the dataset from a monolingual one (NTCIR-17) to a bilingual one (NTCIR-18).
Website: https://sociocom.naist.jp/radnlp-2024/
Kickoff slide[English]
Contact:
"Transfer aims to develop a suite of technology to transfer resources that were generated for one purpose to another in the context of dense retrieval."
Abstract:
The Resource Transfer Based Dense Retrieval (Transfer) task aims to bring together researchers from Information Retrieval, Machine Learning, and Natural Language Processing to develop a suite of technology for transferring resources generated for one purpose to another in the context of dense retrieval on Japanese texts. NTCIR-18 Transfer task is currently considering to provide three subtasks: Dense Cross-Language Retrieval (DCLR), Dense Multimodal Retrieval (DMR), and Retrieval Augmented Generation (RAG).
Website: https://github.com/ntcirtransfer/transfer2/discussions
Kickoff slide[English]
Contact:
PILOT TASKS
TBA
Website: https://sites.google.com/view/ntcir-18-hidden-rad/hidden-rad
Kickoff slide[English]
Contact:
"SUSHI pilot task explores retrieval methods for undigitized documents maintained in archival repositories."
Abstract:
The Searching Unseen Sources for Historical Information (SUSHI) pilot task aims to develop search methods for documents that are not digitized by providing testbed. The SUSHI pilot task is welcome for both researchers interested in technologies (e.g. Information Retrieval or Machine Learning) and practitioner (e.g. Librarians or archivists), so that we can explore needs for a search system for undigitized documents, and evaluation ways of such systems.
Website: https://sites.google.com/view/ntcir-sushi-task/
Kickoff slide[English]
Contact:
"The U4 task is designed to develop techniques for extracting structured information from tabular data and documents, with a special focus on annual securities reports."
Abstract:
The U4 task is designed to develop techniques for extracting structured information from tabular data and documents, with a special focus on annual securities reports. We will provide a dataset based on ASRs for training and testing, and collaboratively investigate appropriate evaluation metrics and methodologies with participants for information extraction from tabular data and documents. Our plan includes the following subtasks: Table Retrieval and Table Question Answering (Table QA). The Table Retrieval subtask aims to identify suitable tables from the ASRs, while the Table QA subtask is focused on providing precise answers from tables to user's questions.
Website: https://sites.google.com/view/ntcir18-u4/
Kickoff slide[English]
Contact:
Last Modified: 2024-05-29