NTCIR-16 Conference


NTCIR-16 Conference Tutorial

Date: June 14th (Tue), 2022
(Time: 10:00 - 12:00(JST), 1:00 - 3:00 (GMT), Jun 13, 21:00 - 23:00 (EDT))

Title: Evaluating Evaluation Measures, Evaluating Information Access Systems, Designing and Constructing Test Collections, and Evaluating Again

Speaker: Tetsuya Sakai (Waseda University)

Dr. Tetsuya Sakai


I plan to cover the following topics in this tutorial:
1. Why is (offline) evaluation important?
2. On a few evaluation measures used at NTCIR
3. How should we choose the evaluation measures?
4. How should we design and build a test collection?
5. How should we ensure the quality of the gold data?
6. How should we report the results?
7. Quantifying reproducibility and progress
8. Summary


Tetsuya Sakai is a professor at the Department of Computer Science and Engineering, Waseda University, Japan. He is also a General Research Advisor of Naver Corporation, Korea (2021-), and a visiting professor at the National Institute of Informatics, Japan (2015-). He joined Toshiba in 1993 and obtained a Ph.D from Waseda in 2000. From 2000 to 2001, he was supervised by the late Karen Sparck Jones at the Computer Laboratory, University of Cambridge, as a visiting researcher. In 2007, he joined NewsWatch, Inc. as the director of the Natural Language Processing Lab. In 2009, he joined Microsoft Research Asia. He joined the Waseda faculty in 2013. He was Associate Dean (IT Strategies Division) from 2015 to 2017, and Department Head from 2017 to 2019. He is an ACM distinguished member, and a senior associate editor of ACM TOIS.


Keynote 1

NTCIR-16 Conference Keynote 1

Date: June 15th (Wed), 2022
(Time: 10:00 - 11:00(JST), 1:00 - 2:00 (GMT), Jun 14, 21:00 - 22:00 (EDT))

Title: Information Retrieval Evaluation as Search Simulation

Speaker: Chengxiang Zhai (University of Illinois at Urbana-Champaign, USA)

Prof. Chengxiang Zhai


Due to the empirical nature of the Information Retrieval (IR) task, experimental evaluation of IR methods and systems is essential. Historically, evaluation initiatives such as TREC, CLEF, and NTICR have made significant impacts on IR research and resulted in many test collections that can be reused by researchers to study a wide range of IR tasks in the future. However, despite its great success, the traditional Cranfield evaluation methodology using a test collection has significant limitations, especially for evaluating an interactive IR system, and it remains an open challenge how to evaluate interactive IR systems using reproducible experiments. In this talk, I will discuss how we can address this challenge by framing the problem of IR evaluation more generally as search simulation, i.e., having an IR system interact with simulated users and measuring the performance of the system based on its interaction with the simulated users. I will first present a general formal framework for evaluating IR systems based on search session simulation, discussing how the framework can not only cover the traditional Cranfield evaluation method as a special case but also reveal potential limitations of the traditional IR evaluation measures. I will then review the recent research progress in developing formal models for user simulation and evaluating user simulators. Finally, I will discuss how we may leverage the current IR test collections to support simulation-based evaluation by developing and deploying user simulators based on those existing collections. I will conclude the talk with a brief discussion of important future research directions in simulation-based IR evaluation.


ChengXiang Zhai is a Donald Biggar Willett Professor in Engineering of the Department of Computer Science at the University of Illinois at Urbana-Champaign, where he also holds a joint appointment at the Carl R. Woese Institute for Genomic Biology, Department of Statistics, and the School of Information Sciences. He received a Ph.D. in Computer Science from Nanjing University in 1990, and a Ph.D. in Language and Information Technologies from Carnegie Mellon University in 2002. He worked at Clairvoyance Corp. as a Research Scientist and a Senior Research Scientist from 1997 to 2000. His research interests are in the general area of intelligent information systems, including specifically intelligent information retrieval, data mining, natural language processing, machine learning, and their applications in domains such as biomedical informatics, and intelligent education systems. He has published over 300 papers in these areas and holds 6 patents. He offers two Massive Open Online Courses (MOOCs) on Coursera covering Text Retrieval and Search Engines and Text Mining and Analytics, respectively, and was a key contributor of the Lemur text retrieval and mining toolkit. He served as Associate Editors for major journals in multiple areas including information retrieval (ACM TOIS, IPM), data mining (ACM TKDD), intelligent systems (ACM TIST), and medical informatics (BMC MIDM), Program Co-Chairs of NAACL HLT'07, SIGIR'09, and WWW'15, and Conference Co-Chairs of CIKM'16, WSDM'18, and IEEE BigData'20. He is an ACM Fellow and a member of ACM SIGIR Academy. He received multiple awards, including ACM SIGIR Gerard Salton Award, ACM SIGIR Test of Time Paper Award (three times), the 2004 Presidential Early Career Award for Scientists and Engineers (PECASE), Alfred P. Sloan Research Fellowship, IBM Faculty Award, HP Innovation Research Award, Microsoft Beyond Search Research Award, UIUC Rose Award for Teaching Excellence, and UIUC Campus Award for Excellence in Graduate Student Mentoring. He has graduated 38 PhD students and over 50 MS students.


Keynote 2

NTCIR-16 Conference Keynote 2

Date: June 15th (Wed), 2022
(Time: 20:00 - 21:00 (JST), 11:00 - 12:00 (GMT), 7:00 - 8:00 (EDT))

Title: Cranfield is Dead; Long Live Cranfield

Speaker: Ellen Voorhees (NIST, USA)

Prof. Ellen Voorhees


Evaluating search system effectiveness is a foundational hallmark of information retrieval research. Doing so requires infrastructure appropriate for the task at hand, which has frequently entailed using the Cranfield paradigm: test collections and associated evaluation measures. Observers have declared Cranfield moribund multiple times in its 60 year history, though each time test collection construction techniques and evaluation measure definitions have evolved to restore Cranfield as a useful tool. Now Cranfield's effectiveness is once more in question since corpora sizes have grown to the point that finding a few relevant documents is easy enough to saturate high-precision measures while deeper measures are unstable because too few of the relevant documents have been identified. In this talk I'll review how Cranfield evolved in the past and examine its prospects for the future.


Ellen Voorhees is a Fellow at the US National Institute of Standards and Technology (NIST). For most of her tenure at NIST she managed the Text REtrieval Conference (TREC) project, a project that develops the infrastructure required for large-scale evaluation of search engines and other information access technology. Currently she is examining how best to bring the benefits of large-scale community evaluations to bear on the problems of trustworthy AI. Voorhees' general research focuses on developing and validating appropriate evaluation schemes to measure system effectiveness for diverse user tasks. Voorhees is a fellow of the ACM, a member of the ACM SIGIR Academy, and has been elected as a fellow of the Washington Academy of Sciences. She has published numerous articles on information retrieval techniques and evaluation methodologies and serves on the review boards of several journals and conferences.


Last modified: 2022-05-24