Policy on the Treatment of the Abnormal Document Records

[NTCIR-4 CLIR CFP]
  1. What is an abnormal record?
    The abnormal record is a record which has <HEADLINE> data but no contant in <TEXT>.
    It includes:

    --- sample---
    <HEADLINE>MEXT announced ... (illustration.) </HEADLINE>
    <TEXT></TEXT>
    ---
    Please notice that document records with <TEXT> and with no <HEADLINE> are treated as "ordinary" records and won't be discarded.

  2. Which document sets include the abnormal records?
    We detected the abnormal records in the following datasets.

    (1) Yomiuri (Japanese) : 2422 records (2) CIRB011 (Chinese):  1 record (3) CIRB020 (Chinese):  305 records (4) Xinhua (English):  1 record * Zipped file for the above 4 is available.

  3. The abnormal records will not be judged by human assessors.
    It is possible that the abnormal records with no <TEXT> data are searched in higher ranks because these records have <HEADLINE> data. However, we can not determine their degrees of relevance if the <TEXT> data are not included in the document sets for some reasons (illustration, picture, table and so on).
    Therefore, these records will be excluded from the relevance judgments by human assessors.

  4. How to treat these records ?
    You can choose a method you like from (1) to (4).
    (1) DOC LIST
    A method of removing these records is to simply delete document IDs of these records from your document list of search output.
    (2) INDEXING
    Another method of removing these records is to remove these records from the document sets and to execute indexing process again.
    (3) OTHERS
    You can select any other methods for removing the abnormal records from your search results.
    (4) NONE
    You can do nothing.
    * Please put top 1000 documents in each file of your search results regardless you remove the abnormal records or not.

  5. If the abnormal records are included in your search result file, they won't be judged by human assessors and automatically treated as "irrelevant."
    We will accept your search result file even if the abnormal records are included into it. But, these records are not going to be judged by human assessors, and to be treated automatically as irrelevant, as mentioned above.

  6. Please describe your method for removing the abnormal records in your system description.
    The <COMMENT> field is prepared in the template of system description. Could you please write your method for removing the abnormal records in the <COMMENT> filed? For example,
    (a)"Removing abnormal records: DOC LIST"
    (b)"Removing abnormal records: INDEXING"
    (c)"Removing abnormal records: OTHERS" (please also describe the method for removing)
    (d)"Removing abnormal records: NONE"

  7. Postponement of submission deadline
    The submission deadline is extended to 2003-11-08 (Sat.). That is, an extra week is provided to removing the abnormal records.


* This is based on the following email "subject: [ntcadm-clir : 1138] Abnormal Records : a detailed report".

Thanks to Nakagawa-san and Sakai-san for their corporation and useful suggestions on this matter!


[NTCIR-4 CLIR CFP]
Last modified : 2003-10-30
ntcadm-clir@nii.ac.jp