第18回 NTCIR (2024 -
2025)
情報アクセス技術の評価
カンファレンス: 2025年 6月10日(火)~13日(金) 東京
学術総合センター
NTCIR-18 タスク参加のご案内:
参加者募集 [Flyer]
タスク参加の手引/参加登録フォーム
情報アクセス技術向上のための協同的な取り組みに参加してみませんか?
第18回目のNTCIR、NTCIR-18では、共通のデータセットを用いて研究するタスクへの参加チームを募集中です。
情報アクセス技術の評価には、研究者の協同作業の結果として作成される「テストコレクション」に基づく評価が欠かせません。NTCIRは、数多くの研究者の協力の下で、その評価基盤の形成に過去20年以上に渡って取り組み、技術の発展に貢献してきました。そして日々開発される新しい技術に対する評価手法を模索しつつ、活動を進めております。
情報アクセス分野の学生や若手研究者のみなさん,先生方,企業で研究をなさっている方,
または情報学に興味のある方々,大規模なテストコレクションを用いた検索、質問応答、自然言語処理に関心のある研究グループは、どなたでも歓迎します。
どうぞ、奮ってご参加ください。
参加登録はこちらをご覧ください:https://research.nii.ac.jp/ntcir/ntcir-18/howto-ja.html
*NTCIR-18では、オンライン発表が可能です。
第18回NTCIR
(NTCIR-18) プログラム委員会は、以下の7つのコアタスクと3つのパイロットタスクを選定しました。
タスクの詳細・最新情報について、下記のタスク概要および各タスクのウェブサイトをご覧ください。
AEOLLM
FairWeb-2
FinArg-2
Lifelog-6
MedNLP-CHAT
RadNLP
Transfer-2
HIDDEN-RAD
SUSHI
U4
コアタスク
"AEOLLM concentrates on generative tasks and encourages
participants to develop reference-free evaluation
methods"
Abstract:
As LLMs grow popular in both academia and industry, how to
effectively evaluate the capacity of LLMs becomes an
increasingly critical but still challenging issue. Existing
methods can be divided into two types: manual evaluation, which
is expensive, and automatic evaluation, which faces many
limitations including the task format (the majority belong to
multiple-choice questions) and evaluation criteria (occupied by
reference-based metrics). To advance the innovation of automatic
evaluation, we proposed the Automatic Evaluation of LLMs
(AEOLLM) task which focuses on generative tasks and encourages
reference-free methods. Besides, we set up diverse subtasks such
as summary generation, non-factoid question answering, text
expansion, and dialogue generation to comprehensively test
different methods.
Website: https://aeollm.github.io
Kickoff slide[English]
Contact:
"Return a group-fair and relevant SERP (or textual
response) for a given search topic about researchers, movies, or
youtube clips!"
Abstract:
Web search: given a researcher/movie/youtube topic, return a
SERP (Search Engine Result Page) that is both relevant and
group-fair. Conversational search: instead of a SERP, return a
textual response.
Website: http://sakailab.com/fairweb2/
Kickoff slide[English]
Contact:
"FinArg-2 focuses on the assessment of temporal
information, which is a distinct phenomenon in financial
opinions."
Abstract:
In FinArg-1, we explored three types of financial documents and
proposed tasks that combine argument mining and sentiment
analysis. In FinArg-2, we aim to introduce "Temporal Inference
of Financial Arguments," focusing on the assessment of temporal
information, which is a distinct phenomenon in financial
opinions. In FinArg-2, we will continue utilizing the same
resources as in FinArg-1, including analyst reports, earnings
conference calls, and social media data. Furthermore, all
annotations will be on the same documents, enabling participants
to leverage features from FinArg-1 to enhance their performance.
Website: https://sites.google.com/nlg.csie.ntu.edu.tw/ntcir-18-finarg-2/finarg-2
Kickoff slide[English]
Contact:
"Lifelog task aims to advance the state of the art in
multimodal lifelog organisation, search and access"
Abstract:
The lifelog task is a continuation of the tasks of pervious
years with aim to advance the state-of-the-art in multimodal
lifelog retrieval and analytics. We have released the first
lifelog datasets with this task in the past and we have
attracted over 100 participating teams to address these
challenges since 2015. The current lifelog task aims to improve
community knowledge and expertise in asynchronous lifelog
retrieval from multi-year archives, Q&A from lifelogs and novel
lifelog analytics. We expect to attract a wide range of
interested participants.
Website: http://lifelogsearch.org/ntcir-lifelog/
Contact:
"MedNLP-CHAT evaluates medical chatbots based on multiple
viewpoints."
Abstract:
Medical chatbot service is promising solution for
medical/healthcare human resource problem. But, the risk of
chatbot is not well known: We created the testbed of the
potential chatbot responses from various aspects: medical
validation, legal viewpoints, ethical issue, etc.
Website: https://sociocom.naist.jp/mednlp-chat/
Kickoff slide[English]
Contact:
"RadNLP focuses on automated staging of lung cancer from
radiology reports."
Abstract:
Lung cancer has different optimal treatments depending on its
stage, or the degree of progression. However, much information
regarding the stage is contained in unstructured free-text
radiology reports, making it burdensome for human to make
decisions. In this task, we explore the potential of NLP to aid
the workflow by automatically determining the stage of lung
cancer. We extend the dataset from a monolingual one (NTCIR-17)
to a bilingual one (NTCIR-18).
Website: https://sociocom.naist.jp/radnlp-2024/
Kickoff slide[English]
Contact:
"Transfer aims to develop a suite of technology to transfer
resources that were generated for one purpose to another in the
context of dense retrieval."
Abstract:
The Resource Transfer Based Dense Retrieval (Transfer) task aims
to bring together researchers from Information Retrieval,
Machine Learning, and Natural Language Processing to develop a
suite of technology for transferring resources generated for one
purpose to another in the context of dense retrieval on Japanese
texts. NTCIR-18 Transfer task is currently considering to
provide three subtasks: Dense Cross-Language Retrieval (DCLR),
Dense Multimodal Retrieval (DMR), and Retrieval Augmented
Generation (RAG).
Website: https://github.com/ntcirtransfer/transfer2/discussions
Kickoff slide[English]
Contact:
パイロットタスク
TBA
Website: https://sites.google.com/view/ntcir-18-hidden-rad/hidden-rad
Kickoff slide[English]
Contact:
"SUSHI pilot task explores retrieval methods for
undigitized documents maintained in archival
repositories."
Abstract:
The Searching Unseen Sources for Historical Information (SUSHI)
pilot task aims to develop search methods for documents that are
not digitized by providing testbed. The SUSHI pilot task is
welcome for both researchers interested in technologies (e.g.
Information Retrieval or Machine Learning) and practitioner
(e.g. Librarians or archivists), so that we can explore needs
for a search system for undigitized documents, and evaluation
ways of such systems.
Website: https://sites.google.com/view/ntcir-sushi-task/
Kickoff slide[English]
Contact:
"The U4 task is designed to develop techniques for
extracting structured information from tabular data and
documents, with a special focus on annual securities
reports."
Abstract:
The U4 task is designed to develop techniques for extracting
structured information from tabular data and documents, with a
special focus on annual securities reports. We will provide a
dataset based on ASRs for training and testing, and
collaboratively investigate appropriate evaluation metrics and
methodologies with participants for information extraction from
tabular data and documents. Our plan includes the following
subtasks: Table Retrieval and Table Question Answering (Table
QA). The Table Retrieval subtask aims to identify suitable
tables from the ASRs, while the Table QA subtask is focused on
providing precise answers from tables to user's questions.
Website: https://sites.google.com/view/ntcir18-u4/
Kickoff
slide[English]
Contact:
Last modified: 2024-05-29