NTCIR-17 Conference

Keynote 1

NTCIR-17 Conference Keynote 1

Date: DAY-2, Dec 13th (Wed), 2023 (Time: 10:45 - 11:45)

Title: Using Language Models for Relevance Labelling

Speaker: Paul Thomas (Microsoft)

Dr. Paul Thomas


Relevance labels – annotations that say whether a result is relevant to a given search – are key to evaluating the quality of a search engine. Standard practice to date has been to ask in-house or crowd workers to label results, but recently-developed language models are able to produce labels at greatly reduced cost. At Bing we have been using GPT-4, with human oversight, for relevance labelling at web scale. We find that models produce better labels than third-party or even in-house workers, for a fraction of the cost, and these labels let us train notably better rankers. In this talk I'll report on our experiences with GPT-4, including experiments with in-house data and with TREC-Robust. We see accuracy as good as human labellers, and similar capability to pick "interesting" cases, as well as variation due to details of prompt wording. High accuracy makes it hard to improve, and I'll also discuss our work on high-quality "gold" labels and on metrics for the labels themselves.


Paul Thomas is a senior applied scientist at Microsoft, where he works on measurement for Bing. His research is in information retrieval: particularly in how people use web search systems and how we should evaluate these systems, as well as interfaces for search including search with different types of results, search on mobile devices, and search as conversation.


Keynote 2

NTCIR-17 Conference Keynote 2

Date: DAY-4, Dec 15th (Dec), 2023 (Time: 11:00 - 12:00)

Title: The Principles of Depth, Breadth, and Precarity in Developing Metrics for Information Retrieval Evaluation

Speaker: Fernando Diaz (CMU, USA)

Dr. Fernando Diaz


In this perspective presentation, I will review two core methods of improving the evaluation of information retrieval systems. Innovation in depth refers to deepening our empirical and theoretical understanding of effectiveness within a class of information retrieval problems. Innovation in breadth refers to understanding effectiveness in novel or underrepresented information access patterns. I will describe how different styles of evaluation research fit into these classes, connecting recent work in fairness and preference-based evaluation to a longer tradition of information retrieval research. I will close by reflecting on the value of seeing evaluation research as brittle and precarious.


Fernando Diaz is an Associate Professor in the Language Technologies Institute at CMU and a researcher at Google Research. Fernando designs and evaluates search engines and recommender systems, covering three themes: quantitative evaluation of AI systems, retrieval-enhanced AI, and assessment of the cultural impact of AI in domains like music and literature. He has previously held both individual contributor and senior research leadership roles at Microsoft Research, Spotify, and Yahoo Research. He is particularly interested in evaluating and optimizing deployment of real AI systems.


EVIA Keynote

EVIA 2023 Keynote

Date: DAY-1, Dec 12th (Tue), 2023 (Time: 13:10 - 13:40)

Title: Evaluating Systems that Generate Content.

Speaker: Ian Soboroff (NIST, USA)

Dr. Ian Soboroff


The astounding emergence of ChatGPT and other AI systems that generate content, and their apparently incredible performance, are an inspiration to the research community. The performance of these LLMs is so impressive it is widely supposed that we can use them to measure their own effectiveness! We have had evaluation methods for generated content, including question answering, summarization, and translation, and in this talk I dust them off and present both a historical view and how we might approach those methods today. tl;dr, we have a lot of work to do.


Dr. Ian Soboroff is a computer scientist and leader of the Retrieval Group at the National Institute of Standards and Technology (NIST). The Retrieval Group organizes the Text REtrieval Conference (TREC), the Text Analysis Conference (TAC), and the TREC Video Retrieval Evaluation (TRECVID). These are all large, community-based research workshops that drive the state-of-the-art in information retrieval, video search, web search, information extraction, text summarization and other areas of information access. He has co-authored many publications in information retrieval evaluation, test collection building, text filtering, collaborative filtering, and intelligent software agents. His current research interests include complex information search, evaluating generative systems and how large language models can inform evaluation.



NTCIR-17 Conference Tutorial

Date: DAY-1, Dec 12th (Tue), 2023 (Time: 10:45 - 11:30)

Title: A Practical Guide to Computing Evaluation Measures and Comparing Systems: Twelve Small Tips

Speaker: Tetsuya Sakai (Waseda University)

Dr. Tetsuya Sakai


The NTCIRPOOL toolkit creates pool files. The NTCIREVAL toolkit computes evaluation measures based on relevance, diversity, group-fairness, and so on. The Discpower toolkit computes p-values. While demonstrating how I actualy use these toolkits, I will provide some small but practically important tips for IR experimenters. The recommendations are based on a few decades of my experience as an "NTCIR researcher." This tutorial complements my NTCIR-16 tutorial (see http://sakailab.com/publications/ for slides and videos).


Tetsuya Sakai is a professor at the Department of Computer Science and Engineering, Waseda University, Japan. He is also a General Research Advisor of Naver Corporation, Korea (2021-), and a visiting professor at the National Institute of Informatics, Japan (2015-). He spent 20 years in industry, including 1.5 years at the University of Cambridge as a visiting researcher from Toshiba, and 4.5 years at Microsoft Research Asia. He has subsequently spent 10 years in academia so far. He is an ACM distinguished member and an IPSJ Fellow. In 2023, he was inducted into the SIGIR Academy.



NTCIR-17 Conference Panel

Date: DAY-2, Dec 13th (Wed), 2023 (Time: 16:00 - 16:45)

Panel: Responsible Information Access: Fairness, Harmlessness, Sustainability, and More

Panelists: Haruka Maeda (Kyoto University), Paul Thomas (Microsoft), and Mark Sanderson (RMIT University)

Moderator: Tetsuya Sakai (Waseda University)

Dr. Haruka Maeda
Haruka Maeda
Dr. Paul Thomas
Paul Thomas
Prof. Mark Sanderson
Mark Sanderson
Dr. Tetsuya Sakai
Tetsuya Sakai (Moderator)


In this panel, we will discuss topics related to responsible information access (IA) - especially evaluating responsible IA. First, our panelists Haruka Maeda (Kyoto University), Paul Thomas (Microsoft), and Mark Sanderson (RMIT University) will each give a brief position talk. Next, we will have the panelists and audience discuss (a) Evaluating LLM-based IA responses; and (b) Evaluating societal impacts in the LLM-based IA era. Some relevant keywords: handling hallucinations and harmful responses, fairness, inequity, worker exploitation, environmental impact, bad research practices.

A more detailed plan of the panel can be found here:

Please participate!


Haruka MAEDA is a program-specific researcher at Graduate School of Law in Kyoto University, who is interested in discrimination by algorithms. Her mater's thesis at Graduate School of Interdisciplinary Information Studies of Tokyo University in Japan, focusing on the wrongness of algorithmic discrimination from the viewpoint of philosophical theory of discrimination, which won the Encouragement Award of the "Telecommunications advancement Foundation Award 2021".


EVIA Panel

EVIA 2023 Panel

Date: DAY-1, Dec 12th (Tue), 2023 (Time: 16:40 - 17:20)

Panel: Evaluation of Large Language Models

Panelists: Akiko Aizawa (NII), Inho Kang (Naver), Yiqun Liu (Tsinghua University), Paul Thomas (Microsoft)

Moderator: Doug Oard

Dr. Akiko Aizawa
Akiko Aizawa
Dr. Inho Kang
Inho Kang
Dr. Yiqun Liu
Yiqun Liu
Dr. Paul Thomas
Paul Thomas
Dr. Doug Oard
Doug Oard (Moderator)





Pre-NTCIR Seminar

Date: DAY-0, Dec 11th (Mon), 2023 (Time: 16:30 - 18:00)

Room: NII Room 1902+1903 (19th Floor)

Title: Three decades of web search, how goes the retrieval revolution?

Speaker: Mark Sanderson (RMIT University, Australia)

Prof. Mark Sanderson


This year marks the 30th anniversary of web search engines and the 25th anniversary of Google being launched as a company. I will use this moment to both reflect on aspects of the information retrieval landscape and also look forward at ideas that I think our community has not examined in sufficient detail. Looking back, while it's clear that Web search has precipitated a revolution in information access, it is worth asking, what did web search revolt against? I will reflect on the way the information access was facilitated in the past. Next, I will examine what I think are some of the issues in information seeking going forward with a particular focus on the gaps in understanding. Here I will talk about some of the new research that we've been conducting at our RMIT university with a particular focus on the importance of conversational search and query variability.


Mark Sanderson is a Professor of Information Retrieval at RMIT University (RMIT), Dean of Research and Innovation in RMIT's STEM College and head of the RMIT Information Retrieval (IR) group. Prof Mark Sanderson is also a Chief Investigator at the RMIT University node of the ARC Centre of Excellence for Automated Decision-Making & Society (ADM+S). He has raised over $50 million dollars in grant income, published over 250 papers, and over 13,000 citations to his work. His research is in the areas of search engines, recommender systems, user, data, and text analytics.

Last modified: 2023-12-07