EVIA2011 Abstract
-
Preface
Kazuaki Kishida, Mark Sanderson, and William Webber
[Pdf]
[Table of Contents]
-
The Effectiveness of Cross-lingual Link Discovery
Ling-Xiang Tang, Kelly Y. Itakura, Shlomo Geva, Andrew Trotman and Yue Xu
[Pdf]
[Table of Contents]
This paper describes the evaluation in benchmarking the effectiveness
of cross-lingual link discovery (CLLD). Cross-lingual link discovery
is a way of automatically finding prospective links between
documents in different languages, which is particularly helpful for
knowledge discovery of different language domains. A CLLD evaluation
framework is proposed for system performance benchmarking.
The framework includes standard document collections, evaluation
metrics, and link assessment and evaluation tools. The evaluation
methods described in this paper have been utilised to quantify the
system performance at NTCIR-9 Crosslink task. It is shown that using
the manual assessment for generating
gold standard can deliver a more reliable evaluation result.
-
A Micro-analysis of Topic Variation for a Geotemporal Query
Fredric Gey, Ray R. Larson, Jorge Machado and Masaharu Yoshioka
[Pdf]
[Table of Contents]
Bias introduced in question wording is a well-known problem in
political attitude survey polling.
For example, the question
"The President believes our military mission in Afghanistan
is a vital national interest -- agree/disagree?"
is quite different from the question:
"Do you believe that a military mission in Afghanistan
is in the USA's vital national interest?"
Response variation according to different question wording has been
studied by researchers in survey methodology. However the influence on
search results from variations of topic wording has not been examined
for geotemporal information retrieval. For the GeoTime evaluation in
NTCIR Workshop 9, the organizers decided to attempt to do an
experiment in query variability in order to study variability of
performance. We took a
single information need and expressed it in three different ways:
1) as a single event question,
2) as a question which would yield an open-ended list
(e.g. the classic "which countries did the Pope
visit in the last three years"),
or 3) a reformulation or the single
event question as a location (latitude/longitude) and time
inquiry. This paper reports the results of this micro-analysis of
variation effects upon a single query expressed in different formats,
as well as the degree of success (or failure) which we achieved (or
did not achieve) our explicit goal of being able to distinguish
performance outcomes for the different formulations.
-
What Makes a Good Answer in Community Question Answering? An Analysis of Assessors' Criteria
Daisuke Ishikawa, Noriko Kando and Tetsuya Sakai
[Pdf]
[Table of Contents]
Community question answering (CQA) has recently become a popular means
of satisfying personal information needs, and methods for effectively
retrieving information from CQA archives are attracting research
interest as the number of reusable questions and answers are rapidly
increasing. Rather than
having to post a question and wait for an answer, an individual can
often obtain an immediate answer by searching the archives. However,
as the quality of archived answers varies widely, a method is needed
for effectively extracting high-quality answers. In this study, we
manually selected random questions and answers from the archives for
Yahoo! Chiebukuro, a Japanese CQA site equivalent to Yahoo! Answers,
had them evaluated by four assessors, and identified the criteria used
by the assessors in their evaluations. These criteria should be useful
in constructing a model for identifying high-quality answers.
-
Evaluation of Interactive Information Access System using Concept Map
Masaharu Yoshioka, Noriko Kando and Yohei Seki
[Pdf]
[Table of Contents]
In this study, we introduce a method to analyze the effectiveness of
the interactive information access system based on the changes in
users' knowledge structure. We use concept maps to compare each
user's knowledge structure before and after a search. We discuss the
difference between proposed evaluation method and questionnaires by
using the results of user experiments for an interactive news
retrieval and analysis system called NSContrast.
-
Reports from Other Evaluation Campaigns and Panel Discussion:
TREC is 20 Years Old, Where Now for Evaluation Campaigns?
Mark Sanderson, William Webber, Ian Soboroff, Gareth Jones,
Andrew Trotman, Shlomo Geva, Nicola Ferro and Hideo Joho
[Pdf]
[Table of Contents]
Although shared test collections have been around for since the 1960s,
the arrival of TREC in 1992 was a major boost to information retrieval
research. It spawned a wide range of evaluation campaigns
around the world including NTCIR, CLEF, FIRE, INEX, etc. However, in
recent years, the number of papers at major conferences, such as SIGIR
or CIKM, that report research using the collections produced
by these campaigns appear to be reducing. More papers are describing
research using so-called private data sets. So, what will the
evaluation campaigns look like in 20 years from now? Is the rise of
private data a threat to the scientific integrity of IR research, or a
valuable diversification of research focus?
|