EVIA2011 Abstract

Preface
Kazuaki Kishida, Mark Sanderson, and William Webber
[Pdf] [Table of Contents]
The Effectiveness of Cross-lingual Link Discovery
Ling-Xiang Tang, Kelly Y. Itakura, Shlomo Geva, Andrew Trotman and Yue Xu
[Pdf] [Table of Contents]

This paper describes the evaluation in benchmarking the effectiveness of cross-lingual link discovery (CLLD). Cross-lingual link discovery is a way of automatically finding prospective links between documents in different languages, which is particularly helpful for knowledge discovery of different language domains. A CLLD evaluation framework is proposed for system performance benchmarking. The framework includes standard document collections, evaluation metrics, and link assessment and evaluation tools. The evaluation methods described in this paper have been utilised to quantify the system performance at NTCIR-9 Crosslink task. It is shown that using the manual assessment for generating gold standard can deliver a more reliable evaluation result.
A Micro-analysis of Topic Variation for a Geotemporal Query
Fredric Gey, Ray R. Larson, Jorge Machado and Masaharu Yoshioka
[Pdf] [Table of Contents]

Bias introduced in question wording is a well-known problem in political attitude survey polling. For example, the question "The President believes our military mission in Afghanistan is a vital national interest -- agree/disagree?" is quite different from the question: "Do you believe that a military mission in Afghanistan is in the USA's vital national interest?" Response variation according to different question wording has been studied by researchers in survey methodology. However the influence on search results from variations of topic wording has not been examined for geotemporal information retrieval. For the GeoTime evaluation in NTCIR Workshop 9, the organizers decided to attempt to do an experiment in query variability in order to study variability of performance. We took a single information need and expressed it in three different ways: 1) as a single event question, 2) as a question which would yield an open-ended list (e.g. the classic "which countries did the Pope visit in the last three years"), or 3) a reformulation or the single event question as a location (latitude/longitude) and time inquiry. This paper reports the results of this micro-analysis of variation effects upon a single query expressed in different formats, as well as the degree of success (or failure) which we achieved (or did not achieve) our explicit goal of being able to distinguish performance outcomes for the different formulations.
What Makes a Good Answer in Community Question Answering? An Analysis of Assessors' Criteria
Daisuke Ishikawa, Noriko Kando and Tetsuya Sakai
[Pdf] [Table of Contents]

Community question answering (CQA) has recently become a popular means of satisfying personal information needs, and methods for effectively retrieving information from CQA archives are attracting research interest as the number of reusable questions and answers are rapidly increasing. Rather than having to post a question and wait for an answer, an individual can often obtain an immediate answer by searching the archives. However, as the quality of archived answers varies widely, a method is needed for effectively extracting high-quality answers. In this study, we manually selected random questions and answers from the archives for Yahoo! Chiebukuro, a Japanese CQA site equivalent to Yahoo! Answers, had them evaluated by four assessors, and identified the criteria used by the assessors in their evaluations. These criteria should be useful in constructing a model for identifying high-quality answers.
Evaluation of Interactive Information Access System using Concept Map
Masaharu Yoshioka, Noriko Kando and Yohei Seki
[Pdf] [Table of Contents]

In this study, we introduce a method to analyze the effectiveness of the interactive information access system based on the changes in users' knowledge structure. We use concept maps to compare each user's knowledge structure before and after a search. We discuss the difference between proposed evaluation method and questionnaires by using the results of user experiments for an interactive news retrieval and analysis system called NSContrast.
Reports from Other Evaluation Campaigns and Panel Discussion: TREC is 20 Years Old, Where Now for Evaluation Campaigns?
Mark Sanderson, William Webber, Ian Soboroff, Gareth Jones, Andrew Trotman, Shlomo Geva, Nicola Ferro and Hideo Joho
[Pdf] [Table of Contents]

Although shared test collections have been around for since the 1960s, the arrival of TREC in 1992 was a major boost to information retrieval research. It spawned a wide range of evaluation campaigns around the world including NTCIR, CLEF, FIRE, INEX, etc. However, in recent years, the number of papers at major conferences, such as SIGIR or CIKM, that report research using the collections produced by these campaigns appear to be reducing. More papers are describing research using so-called private data sets. So, what will the evaluation campaigns look like in 20 years from now? Is the rise of private data a threat to the scientific integrity of IR research, or a valuable diversification of research focus?