Submission Guideline for NTCIR-4 CLIR Task
how to submit search results
Oct 15, 2003
Oct 30, 2003: deadline was modified.
Oct 31, 2003: how to submit result files was added.
1. Files to Be Submitted
All participants have to submit (a) files of document list by each run and (b) a file of system descriptions. Please use XML-style tags for describing your system according to instruction
in the section 5. An example is as follows.
(a) example of document list (search results)
030 0 cts_cec_19991118596 1
4238 LIPS-C-CJE-T-01
030 0 cts_cec_19991118596 2
3211 LIPS-C-CJE-T-01
...........
030 0 cts_cec_19991118596 1000
1116 LIPS-C-CJE-T-01
(b) example of system description
<TECHDESC>
<RUN>
<ID>LIPS-C-CJE-T-01</ID>
<INDEXUNIT>word</INDEXUNIT>
<INDEXTECH>morphology</INDEXTECH>
<INDEXSTRUC>inverted file</INDEXSTRUC>
<QUERYUNIT>word</QUERYUNIT>
<MODEL>vector space</MODEL>
<RANK>tf-idf</RANK>
<TRANS>dictionary-based query translation</TRANS>
<QEXP>pre- and post-translation expansion by Rocchio</QEXP>
<CORPUS>using NTCIR-1 Japanese document collections for expansion</CORPUS>
<PIVOT>none</PIVOT>
<COMMENT>none</COMMENT>
</RUN>
<RUN>
<ID>LIPS-C-CJE-D-02</ID>
......
</RUN>
</TECHDESC>
2. Type of Runs
Mandatory Runs: T-run and D-run
Each participant must submit two types of run for each combination of topic
language and document language(s);
The purpose of asking participants to submit these mandatory runs is to make research findings clear by comparing systems or methods under a unified condition.
Recommended Runs: DN-run
Also, the task organizers would like to recommend strongly DN run, which
is run using <DESC> and <NARR> fields are used.
Optional Runs
Other any combinations of fields are allowed to submit as optional runs
according to each participant's research interests, e.g. TDN-run, DC-run,
TDNC-run and so on.
3. Number of Runs
Each participant can submit up to 5 runs in total for each language pair
regardless of the type of run, and participants are allowed to include
two T runs in maximum and also two D-runs in maximum into the 5 runs. The
language pair means the combination of topic language and document language(s).
For example,
Language combination -> Topic: C and Docs: CJE (C->CJE)
Submission -> two T-runs, a D-run, a DN-run and a TDNC run (5 runs in
total).
4. Identification and Priority of Runs
Each run has to be associated with a RunID. RunID is an identity for each run. The rule of format for RunID is as follows.
The 'pp' is two digits used to represent the priority of the run. It will be used as a parameter for pooling. The participants have to decide the priority for each submitted run in the basis of each language pair. "01" means the high priority. For example, a participating group, LIPS, submits 3 runs for C-->CJE. The first is a T run, the second is a D run and the third is a DN run. Therefore, the Run ID for each run is LIPS-C-CJE-T-01, LIPS-C-CJE-D-02, and LIPS-C-CJE-DN-03, respectively. Or, if the group uses different ranking techniques in T run for C --> CJE, the RunID for each run has to be LIPS-C-CJE-T-01, LIPS-C-CJE-T-02, and LIPS-C-CJE-D-03.
5. System Description
5.1 Descriptive Information
In addition to search results, every participating group has to give us a concise description of each run. This description should contain the following information.
<INDEXUNIT>: Unit of indexing, e.g., character, bi-character, bi-word, phrase, etc.
<INDEXTECH>: Techniques for indexing, e.g., morphology, stemming, POS, etc
<INDEXSTRUC>: inverted file, signature file, PAT, etc.
<QUERYUNIT>:character, word, phrase, etc.
<MODEL>:vector space model, probabilistic model (Okapi, INQUERY, logistic regression),
etc.
<RANK>:ranking factor for measuring each term, e.g., tf, tf/idf, mutual information,
word association, document length, etc.
<TRANS>: translation technique used to deal with cross-lingual information retrieval,
e.g., dictionary-based, corpus-based, MT, etc. The detailed information
are welcome, e.g., select-all, select-top-N, translation disambiguation,
etc.
<QEXP>: techniques used to expand query or no query expansion.
<CORPUS>: information about special corpus used to translation, expansion,etc.
<PIVOT>: language used for pivot approach, e.g., English.
<COMMENT>: any other comments.
5.2 Root tags
Please pack system descriptions for all runs into a single file using two
root tags, <TECHDESC> and <RUN>, as follows;
<TECHDESC>
<RUN>
...description of the run1...
</RUN>
<RUN>
...description of the run2...
</RUN>
...
</TECHDESC>
5.3 Template
Please copy and use a template for writing your description.
5.4 File name and format
Please store the system descriptions into a single plain-text file (.txt)
with your group name as it's file name, e.g., LIPS.txt.
6. Document List
6.1 Format
Since the TREC's evaluation program is used to carry out the relevance
assessment, each participating group has to submit its retrieval result
in the designated format. The result file is a list of tuples in the following
form:
001 0 cts_cec_19991118596 1 9999
LIPS-C-CJE-T-01
001 0 cts_cec_19991120000 2 9998
LIPS-C-CJE-T-01
001 0 cts_cec_19980982596 3 9978
LIPS-C-CJE-T-01
001 0 cts_cec_19990118116 4 9970
LIPS-C-CJE-T-01
001 0 cts_cec_19990618596 5 9812
LIPS-C-CJE-T-01
002 0 cts_cec_19980812123 1 9999
LIPS-C-CJE-T-01
002 0 cts_cec_19990918596 2 9910
LIPS-C-CJE-T-01
The search result file which will be sent should follow the format below:
Topic-ID Dummy-field Document-ID Rank Similarity-value Run-ID
Note: A format checker for the document list is available at the CLIR download cite
6.2 File name and format
Please store the document list for each run into a single plain-text file
with RunID as it's file name, e.g., LIPS-C-CJE-T-01 (with no file identifier).
7. How to Submit Files
Please send your search results to us according to the following
procedure by the deadline.
Deadline
Please make sure of your group's ID that you
specified in your application form for NTCIR-4 CLIR task.
Please attach your group's ID to the head of the file name (e.g., NII.list.txt).
NII-J-C-T-01
NII-J-C-D-02
....
NII-C-J-DN-03
NII.txt
http://rcir.nii.ac.jp/ntcir/access/***/+++/ ntcir4/clir/index.html
(*** and +++ are different by each group)
http://rcir.nii.ac.jp/ntcir/access/***/+++/ ntcir4/clir/fup.html
(user ID and password are the same as index.html).
Nov. 01, 2003 23:59 Japanese Time
Nov. 08, 2003 23:59 Japanese Time (modified on 2003-10-30)@
8.Others
We would like to remind you that you must return the document data if you
do NOT submit any results.
9. Contact Information
If you have any questions, please contact with task organizers (ntcadm-clir@nii.ac.jp).