タスク概要・参加者募集

第18回 NTCIR (2024 - 2025)
情報アクセス技術の評価
カンファレンス: 2025年 6月10日(火)~13日(金) 東京 学術総合センター


NTCIR-18 タスク参加のご案内:

参加者募集 [Flyer]
タスク参加の手引/参加登録フォーム

情報アクセス技術向上のための協同的な取り組みに参加してみませんか?

第18回目のNTCIR、NTCIR-18では、共通のデータセットを用いて研究するタスクへの参加チームを募集中です。
情報アクセス技術の評価には、研究者の協同作業の結果として作成される「テストコレクション」に基づく評価が欠かせません。NTCIRは、数多くの研究者の協力の下で、その評価基盤の形成に過去20年以上に渡って取り組み、技術の発展に貢献してきました。そして日々開発される新しい技術に対する評価手法を模索しつつ、活動を進めております。
情報アクセス分野の学生や若手研究者のみなさん,先生方,企業で研究をなさっている方,
または情報学に興味のある方々,大規模なテストコレクションを用いた検索、質問応答、自然言語処理に関心のある研究グループは、どなたでも歓迎します。 どうぞ、奮ってご参加ください。

参加登録はこちらをご覧ください:https://research.nii.ac.jp/ntcir/ntcir-18/howto-ja.html

*NTCIR-18では、オンライン発表が可能です。

評価タスク

第18回NTCIR (NTCIR-18) プログラム委員会は、以下の7つのコアタスクと3つのパイロットタスクを選定しました。
タスクの詳細・最新情報について、下記のタスク概要および各タスクのウェブサイトをご覧ください。

AEOLLM     FairWeb-2     FinArg-2     Lifelog-6     MedNLP-CHAT    
RadNLP     Transfer-2     HIDDEN-RAD     SUSHI     U4    

コアタスク

Automatic Evaluation of LLMs ("AEOLLM")

"AEOLLM concentrates on generative tasks and encourages participants to develop reference-free evaluation methods"

Abstract:
As LLMs grow popular in both academia and industry, how to effectively evaluate the capacity of LLMs becomes an increasingly critical but still challenging issue. Existing methods can be divided into two types: manual evaluation, which is expensive, and automatic evaluation, which faces many limitations including the task format (the majority belong to multiple-choice questions) and evaluation criteria (occupied by reference-based metrics). To advance the innovation of automatic evaluation, we proposed the Automatic Evaluation of LLMs (AEOLLM) task which focuses on generative tasks and encourages reference-free methods. Besides, we set up diverse subtasks such as summary generation, non-factoid question answering, text expansion, and dialogue generation to comprehensively test different methods.

Website: https://aeollm.github.io
Kickoff slide[English]

Contact:

PAGE TOP


The Second Fair Web Task ("FairWeb-2")

"Return a group-fair and relevant SERP (or textual response) for a given search topic about researchers, movies, or youtube clips!"

Abstract:
Web search: given a researcher/movie/youtube topic, return a SERP (Search Engine Result Page) that is both relevant and group-fair. Conversational search: instead of a SERP, return a textual response.

Website: http://sakailab.com/fairweb2/
Kickoff slide[English]

Contact:

PAGE TOP


Temporal Inference of Financial Arguments ("FinArg-2")

"FinArg-2 focuses on the assessment of temporal information, which is a distinct phenomenon in financial opinions."

Abstract:
In FinArg-1, we explored three types of financial documents and proposed tasks that combine argument mining and sentiment analysis. In FinArg-2, we aim to introduce "Temporal Inference of Financial Arguments," focusing on the assessment of temporal information, which is a distinct phenomenon in financial opinions. In FinArg-2, we will continue utilizing the same resources as in FinArg-1, including analyst reports, earnings conference calls, and social media data. Furthermore, all annotations will be on the same documents, enabling participants to leverage features from FinArg-1 to enhance their performance.

Website: https://sites.google.com/nlg.csie.ntu.edu.tw/ntcir-18-finarg-2/finarg-2
Kickoff slide[English]

Contact:

PAGE TOP


Personal Lifelog Organisation & Retrieval Task ("Lifelog-6")

"Lifelog task aims to advance the state of the art in multimodal lifelog organisation, search and access"

Abstract:
The lifelog task is a continuation of the tasks of pervious years with aim to advance the state-of-the-art in multimodal lifelog retrieval and analytics. We have released the first lifelog datasets with this task in the past and we have attracted over 100 participating teams to address these challenges since 2015. The current lifelog task aims to improve community knowledge and expertise in asynchronous lifelog retrieval from multi-year archives, Q&A from lifelogs and novel lifelog analytics. We expect to attract a wide range of interested participants.

Website: http://lifelogsearch.org/ntcir-lifelog/

Contact:

PAGE TOP


Medical Natural Language Processing for AI Chat ("MedNLP-CHAT")

"MedNLP-CHAT evaluates medical chatbots based on multiple viewpoints."

Abstract:
Medical chatbot service is promising solution for medical/healthcare human resource problem. But, the risk of chatbot is not well known: We created the testbed of the potential chatbot responses from various aspects: medical validation, legal viewpoints, ethical issue, etc.

Website: https://sociocom.naist.jp/mednlp-chat/
Kickoff slide[English]

Contact:

PAGE TOP


Natural Language Processing for Radiology ("RadNLP")

"RadNLP focuses on automated staging of lung cancer from radiology reports."

Abstract:
Lung cancer has different optimal treatments depending on its stage, or the degree of progression. However, much information regarding the stage is contained in unstructured free-text radiology reports, making it burdensome for human to make decisions. In this task, we explore the potential of NLP to aid the workflow by automatically determining the stage of lung cancer. We extend the dataset from a monolingual one (NTCIR-17) to a bilingual one (NTCIR-18).

Website: https://sociocom.naist.jp/radnlp-2024/
Kickoff slide[English]

Contact:

PAGE TOP


Resource Transfer Based Dense Retrieval ("Transfer-2")

"Transfer aims to develop a suite of technology to transfer resources that were generated for one purpose to another in the context of dense retrieval."

Abstract:
The Resource Transfer Based Dense Retrieval (Transfer) task aims to bring together researchers from Information Retrieval, Machine Learning, and Natural Language Processing to develop a suite of technology for transferring resources generated for one purpose to another in the context of dense retrieval on Japanese texts. NTCIR-18 Transfer task is currently considering to provide three subtasks: Dense Cross-Language Retrieval (DCLR), Dense Multimodal Retrieval (DMR), and Retrieval Augmented Generation (RAG).

Website: https://github.com/ntcirtransfer/transfer2/discussions
Kickoff slide[English]

Contact:

PAGE TOP


パイロットタスク

Hidden Causality Inclusion in Radiology Report Generation ("HIDDEN-RAD")

TBA

Website: https://sites.google.com/view/ntcir-18-hidden-rad/hidden-rad
Kickoff slide[English]

Contact:

PAGE TOP


Searching Unseen Sources for Historical Information ("SUSHI")

"SUSHI pilot task explores retrieval methods for undigitized documents maintained in archival repositories."

Abstract:
The Searching Unseen Sources for Historical Information (SUSHI) pilot task aims to develop search methods for documents that are not digitized by providing testbed. The SUSHI pilot task is welcome for both researchers interested in technologies (e.g. Information Retrieval or Machine Learning) and practitioner (e.g. Librarians or archivists), so that we can explore needs for a search system for undigitized documents, and evaluation ways of such systems.

Website: https://sites.google.com/view/ntcir-sushi-task/
Kickoff slide[English]

Contact:

PAGE TOP


Unifying, Understanding, and Utilizing Unstructured Data in Financial Reports ("U4")

"The U4 task is designed to develop techniques for extracting structured information from tabular data and documents, with a special focus on annual securities reports."

Abstract:
The U4 task is designed to develop techniques for extracting structured information from tabular data and documents, with a special focus on annual securities reports. We will provide a dataset based on ASRs for training and testing, and collaboratively investigate appropriate evaluation metrics and methodologies with participants for information extraction from tabular data and documents. Our plan includes the following subtasks: Table Retrieval and Table Question Answering (Table QA). The Table Retrieval subtask aims to identify suitable tables from the ASRs, while the Table QA subtask is focused on providing precise answers from tables to user's questions.

Website: https://sites.google.com/view/ntcir18-u4/
Kickoff slide[English]

Contact:

PAGE TOP



Last modified: 2024-05-29