[Date Prev][Date Next][Date Index]

[ntcir:300] Grid@CLEF 2009 - New CLEF 2009 Pilot Track





Grid@CLEF 2009 - New CLEF 2009 Pilot Track


Grid@CLEF is an activity of the Cross-Language Evaluation Forum (CLEF),
which is launching a new pilot track in the CLEF 2009 campaign.
Information about the objectives, the task, the organization, and the
subscription procedure follows; for more information and updates, please
visit the Grid@CLEF Web site at:

http://ims.dei.unipd.it/gridclef/


*Objectives*

Multilingual information access (MLIA) is increasingly part of many
complex systems, such as digital libraries, intranet and enterprise
portals, Web search engines.

The Cross-Language Evaluation Forum (CLEF) research community has been
outstanding and very active in designing, developing, and testing MLIA
methods and techniques, constantly improving the performances of such
components. But is this enough? Do we really know how MLIA components
(stop lists, stemmers, IR models, relevance feedback, translation
techniques, etc.) behave with respect to languages? Do we have a deep
comprehension of how these components interact together when the
language changes?
Unfortunately, today's picture is quite fragmentary since researchers
have mainly focused on specific aspects of multilinguality but a
comprehensive and unifying view is still missing. This situation
prevents an easy adoption of MLIA techniques and technology transfer by
relevant application and developer communities. Indeed, it is often
difficult for people outside the IR community to extract from the
specialised scientific literature indications about the most promising
approaches and solutions.

We are thus launching a cooperative effort where a series of large-scale
and systematic grid experiments will allow us to to improve our
comprehension of MLIA systems and gain an exhaustive picture of their
behaviour with respect to languages. In this way, we can exploit the
valuable resources and experimental collections made available by CLEF
over the years in order to gain more insights about the effectiveness of
the various weighting schemes and retrieval techniques with respect to
the languages and to disseminate this knowledge to the relevant
application and developer communities.


*Task*

This first year task focuses on *monolingual retrieval*, i.e. querying
topics against documents in the same language of the topics, *in five
European languages*:

   * Dutch;
   * English;
   * French;
   * German;
   * Italian.

The selected languages will allow participants to test both romance and
germanic languages, as well as languages with word compounding issues.
Moreover, these languages have been extensively studied in the MLIA
field and, therefore, it will be possible to compare and assess the
outcomes of the first year experiments with respect to the existing
literature.

The reference scenario for Grid@CLEF 2009 concerns an IR system which
consists of:

   * a tokenizer component for processing the input document collection
     and producing a stream of tokens;
   * an optional stop list component for removing stop words form the
     stream of tokens;
   * an optional word decompounder component for splitting compound
     words in the stream of tokens;
   * an optional stemmer component for stemming words in the stream of
     tokens;
   * a weighting/scoring engine component for scoring documents against
     queries and producing an output ranked list.


Instead of directly feeding the next component, as usually happens in a
monolithic IR system, the Grid@CLEF task requires each component to
input and output from/to XML files in a well-defined format. This choice
allows the exchange of these XML files among participants and the
creation of a whole experiment from the chaining of components that may
belong to different IR systems.

Therefore, the Grid@CLEF 2009 track has a twofold goal:

  1. to prepare participants' systems to work according to this new
     framework based on the exchange of well-defined XML messages;
  2. to conduct as many experiments as possible, i.e. to put as many
     dots as possible on the grid, according to this new framework.


To facilitate the participation in this first year task, participants
are required to participate in what we call the *island mode*, where all
the components which constitute the IR system of the reference scenario
are developed and run by the same participant. The participant is only
requested to implement the XML messaging format for each of his own
components and publish all the intermediate results of these components
on the online XML messaging exchange system.

*Participanting in the Grid@CLEF 2009 pilot track is easy: you only need
to join the island mode and produce as many experiments as possible.*

*Schedule*

The tentative schedule for the Grid@CLEF 2009 track is as follows:

   *  Topics and collections release: early March 2009;
   *  XML messaging framework specification release: early April 2009;
   *  XML messaging exchange online system release: early May 2009;
   *  Experiment submission: mid June 2009;
   *  Results computation: early July 2009;
   *  Working note papers: mid August 2009;
   *  CLEF 2009 Workshop: from 30 September to 2 October 2009 in Corfu,
     Greece.



*Track Coordinators*

   * Nicola Ferro, University of Padua, Italy - ferro@xxxxxxxxxxxx
   * Donna Harman, National Institute of Standards and Technology
     (NIST), USA - donna.harman@xxxxxxxx



*Advisory Committee*

   * Chris Buckley, Sabir Research, USA;
   * Fredric Gey, University of California at Berkeley, USA;
   * Kalervo Javelin, University of Tampere, Finland;
   * Noriko Kando, National Institute of Informatics (NII), Japan;
   * Craig Macdonald, University of Glasgow, UK;
   * Prasenjit Majumder, Indian Statistical Institute, Kolkata, India;
   * Paul McNamee, Johns Hopkins University, USA;
   * Teruko Mitamura, Carnegie Mellon University, USA;
   * Mandar Mitra, Indian Statistical Institute, Kolkata, India;
   * Stephen Robertson, Microsoft Research Cambridge and City
     University London, UK;
   * Jacques Savoy, University of Neuchael, Switzerland.



*Subscriptions*

Registration for CLEF 2009 and subscription to the Grid@CLEF 2009 pilot
track open *4 February*. You can find more information on the main CLEF
Web site at:

http://www.clef-campaign.org/

under "CLEF 2009".