NTCIR-16 Conference


NTCIR-16 Conference Tutorial

Date: June 14th (Tue), 2022
(Time: 10:00 - 12:00(JST), 1:00 - 3:00 (UTC), Jun 13, 21:00 - 23:00 (EDT))

Title: Evaluating Evaluation Measures, Evaluating Information Access Systems, Designing and Constructing Test Collections, and Evaluating Again

Speaker: Tetsuya Sakai (Waseda University)

Dr. Tetsuya Sakai


I plan to cover the following topics in this tutorial:
1. Why is (offline) evaluation important?
2. On a few evaluation measures used at NTCIR
3. How should we choose the evaluation measures?
4. How should we design and build a test collection?
5. How should we ensure the quality of the gold data?
6. How should we report the results?
7. Quantifying reproducibility and progress
8. Summary


Tetsuya Sakai is a professor at the Department of Computer Science and Engineering, Waseda University, Japan. He is also a General Research Advisor of Naver Corporation, Korea (2021-), and a visiting professor at the National Institute of Informatics, Japan (2015-). He joined Toshiba in 1993 and obtained a Ph.D from Waseda in 2000. From 2000 to 2001, he was supervised by the late Karen Sparck Jones at the Computer Laboratory, University of Cambridge, as a visiting researcher. In 2007, he joined NewsWatch, Inc. as the director of the Natural Language Processing Lab. In 2009, he joined Microsoft Research Asia. He joined the Waseda faculty in 2013. He was Associate Dean (IT Strategies Division) from 2015 to 2017, and Department Head from 2017 to 2019. He is an ACM distinguished member, and a senior associate editor of ACM TOIS.



Keynote 1

NTCIR-16 Conference Keynote 1

Date: June 15th (Wed), 2022
(Time: 10:00 - 11:00(JST), 1:00 - 2:00 (UTC), Jun 14, 21:00 - 22:00 (EDT))

Title: Information Retrieval Evaluation as Search Simulation

Speaker: ChengXiang Zhai (University of Illinois at Urbana-Champaign, USA)

Prof. Chengxiang Zhai


Due to the empirical nature of the Information Retrieval (IR) task, experimental evaluation of IR methods and systems is essential. Historically, evaluation initiatives such as TREC, CLEF, and NTICR have made significant impacts on IR research and resulted in many test collections that can be reused by researchers to study a wide range of IR tasks in the future. However, despite its great success, the traditional Cranfield evaluation methodology using a test collection has significant limitations, especially for evaluating an interactive IR system, and it remains an open challenge how to evaluate interactive IR systems using reproducible experiments. In this talk, I will discuss how we can address this challenge by framing the problem of IR evaluation more generally as search simulation, i.e., having an IR system interact with simulated users and measuring the performance of the system based on its interaction with the simulated users. I will first present a general formal framework for evaluating IR systems based on search session simulation, discussing how the framework can not only cover the traditional Cranfield evaluation method as a special case but also reveal potential limitations of the traditional IR evaluation measures. I will then review the recent research progress in developing formal models for user simulation and evaluating user simulators. Finally, I will discuss how we may leverage the current IR test collections to support simulation-based evaluation by developing and deploying user simulators based on those existing collections. I will conclude the talk with a brief discussion of important future research directions in simulation-based IR evaluation.


ChengXiang Zhai is a Donald Biggar Willett Professor in Engineering of the Department of Computer Science at the University of Illinois at Urbana-Champaign, where he also holds a joint appointment at the Carl R. Woese Institute for Genomic Biology, Department of Statistics, and the School of Information Sciences. He received a Ph.D. in Computer Science from Nanjing University in 1990, and a Ph.D. in Language and Information Technologies from Carnegie Mellon University in 2002. He worked at Clairvoyance Corp. as a Research Scientist and a Senior Research Scientist from 1997 to 2000. His research interests are in the general area of intelligent information systems, including specifically intelligent information retrieval, data mining, natural language processing, machine learning, and their applications in domains such as biomedical informatics, and intelligent education systems. He has published over 300 papers in these areas and holds 6 patents. He offers two Massive Open Online Courses (MOOCs) on Coursera covering Text Retrieval and Search Engines and Text Mining and Analytics, respectively, and was a key contributor of the Lemur text retrieval and mining toolkit. He served as Associate Editors for major journals in multiple areas including information retrieval (ACM TOIS, IPM), data mining (ACM TKDD), intelligent systems (ACM TIST), and medical informatics (BMC MIDM), Program Co-Chairs of NAACL HLT'07, SIGIR'09, and WWW'15, and Conference Co-Chairs of CIKM'16, WSDM'18, and IEEE BigData'20. He is an ACM Fellow and a member of ACM SIGIR Academy. He received multiple awards, including ACM SIGIR Gerard Salton Award, ACM SIGIR Test of Time Paper Award (three times), the 2004 Presidential Early Career Award for Scientists and Engineers (PECASE), Alfred P. Sloan Research Fellowship, IBM Faculty Award, HP Innovation Research Award, Microsoft Beyond Search Research Award, UIUC Rose Award for Teaching Excellence, and UIUC Campus Award for Excellence in Graduate Student Mentoring. He has graduated 38 PhD students and over 50 MS students.


Keynote 2

NTCIR-16 Conference Keynote 2

Date: June 15th (Wed), 2022
(Time: 20:00 - 21:00 (JST), 11:00 - 12:00 (UTC), 7:00 - 8:00 (EDT))

Title: Cranfield is Dead; Long Live Cranfield

Speaker: Ellen Voorhees (NIST, USA)

Prof. Ellen Voorhees


Evaluating search system effectiveness is a foundational hallmark of information retrieval research. Doing so requires infrastructure appropriate for the task at hand, which has frequently entailed using the Cranfield paradigm: test collections and associated evaluation measures. Observers have declared Cranfield moribund multiple times in its 60 year history, though each time test collection construction techniques and evaluation measure definitions have evolved to restore Cranfield as a useful tool. Now Cranfield's effectiveness is once more in question since corpora sizes have grown to the point that finding a few relevant documents is easy enough to saturate high-precision measures while deeper measures are unstable because too few of the relevant documents have been identified. In this talk I'll review how Cranfield evolved in the past and examine its prospects for the future.


Ellen Voorhees is a Fellow at the US National Institute of Standards and Technology (NIST). For most of her tenure at NIST she managed the Text REtrieval Conference (TREC) project, a project that develops the infrastructure required for large-scale evaluation of search engines and other information access technology. Currently she is examining how best to bring the benefits of large-scale community evaluations to bear on the problems of trustworthy AI. Voorhees' general research focuses on developing and validating appropriate evaluation schemes to measure system effectiveness for diverse user tasks. Voorhees is a fellow of the ACM, a member of the ACM SIGIR Academy, and has been elected as a fellow of the Washington Academy of Sciences. She has published numerous articles on information retrieval techniques and evaluation methodologies and serves on the review boards of several journals and conferences.


Keynote 3

NTCIR-16 Conference Keynote 3

Date: June 17th (Fri), 2022
(Time: 17:00 - 18:00(JST), 8:00 - 9:00 (UTC), 4:00 - 5:00 (EDT))

Title: The Impact of Query Variability and Relevance Measurement Scales on Information Retrieval Evaluation

Speaker: Falk Scholer (RMIT University, Australia)

Dr. Falk Scholer


Information retrieval makes extensive use of test collections for the measurement of search system effectiveness. Broadly speaking, this evaluation framework includes three components: search queries; a collection of documents to search over; and relevance judgements. In this talk, we'll consider two aspects of this process: queries, and relevance scales. Test collections typically use a single query to represent a more complex search topic or information need. However, different people may generate a wide range of query variants when instantiating information needs. We'll consider the implications of this for the evaluation of search systems, and the potential benefits and costs of incorporating variant queries into a test collection framework. Relevance judgements are used to indicate whether the documents returned by a retrieval system are appropriate responses for the query. They can be made using a variety of different scales, including ordinal (binary or graded) and techniques such as magnitude estimation. We'll examine a number of different approaches, and explore their benefits and drawbacks for judging relevance for retrieval evaluation.


Falk Scholer is a Professor in the Data Science discipline of the School of Computing Technologies at RMIT University in Melbourne, Australia. His research is in the area of information access and retrieval, focusing on understanding how systems such as search engines can assist users to resolve their information needs, and how their effectiveness can be measured. He also works on issues of fairness, accountability, transparency and ethics of systems and algorithms as part of the ARC Centre of Excellence in Automated Decision Making and Society, and on misinformation, fake news and fact-checking with the RMIT FactLab research hub. Falk is the Deputy Director of the RMIT Centre for Information Discovery and Data Analytics (CIDDA), which brings together experts across different academic disciplines, schools and colleges including computing technologies, science, maths and statistics, engineering, and business. He also teaches a range of courses, including on web development and programming, data science, HCI, and databases, and is the Program Manager for the postgraduate Master of Data Science. Falk also has a keen interest in research ethics and integrity, and chairs the STEM College Human Ethics Advisory Network (CHEAN).


NTCIR-16 Invited Talks

NTCIR-16 Invited Talk 1

Date: June 17th (Fri), 2022
(Time: 18:05 - 18:15 (JST), 9:05 - 9:15 (UTC), 5:05 - 5:15 (EDT))

Title: What is happening in CLEF 2022

Speaker: Nicola Ferro (University of Padua)

Prof. Nicola Ferro




Nicola Ferro ( http://www.dei.unipd.it/~ferro/ ) is full professor of computer science at the University of Padua, Italy. His research interests include information retrieval, its experimental evaluation, multilingual information access and digital libraries and he published more than 350 papers on these topics. He is co-organizer of the Covid-19 MLIA @ Eval initiative and he is the chair of the CLEF evaluation initiative, which involves more than 200 research groups world-wide in large-scale IR evaluation activities. He was the coordinator of the EU 7FP Network of Excellence PROMISE on information retrieval evaluation. He is associate editor of ACM TOIS and was general chair of ECIR 2016, and short papers program co-chair of ECIR 2020.

NTCIR-16 Invited Talk 2

Date: June 17th (Fri), 2022
(Time: 18:15 - 18:25 (JST), 9:15 - 9:25 (UTC), 5:15 - 5:25 (EDT))

Title: TREC's Neural CLIR Track

Speaker: Douglas W. Oard (University of Maryland)

Prof. Douglas W. Oard


Test collections are a product of their time, with older collections containing relevance judgments only for the documents that could be found a decade or more ago when those test collections were first made. Neural Information Retrieval (IR) techniques have recently changed the playing field, ranking and re-ranking documents better than traditional IR techniques. Neural methods have also substantially improved the quality of translation technology, which is of particular importance for Cross-Language IR (CLIR). Accurately measuring the effect of these improvements might thus require a new generation of CLIR test collections. The goal of the Neural CLIR (NeuCLIR) track at TREC 2022 is to begin the process of creating such collections. In this brief talk, I’ll answer the question “What’s new in NeuCLIR?”


Douglas Oard is a Professor in the College of Information Studies and the Institute for Advanced Computer Studies (UMIACS) at the University of Maryland (USA) and a Visiting Professor at the National Institute of Informatics (Japan). He has rich experience with the design and evaluation of systems for CLIR, and is one of the track coordinators for TREC’s new Neural CLIR track.

Last modified: 2022-06-16