Policy on the Treatment of the Abnormal Document Records
[NTCIR-4 CLIR CFP]
- What is an abnormal record?
The abnormal record is a record which has <HEADLINE> data but no
contant in <TEXT>.
It includes:
--- sample---
<HEADLINE>MEXT announced ... (illustration.) </HEADLINE>
<TEXT></TEXT>
---
Please notice that document records with <TEXT> and with no <HEADLINE>
are treated as "ordinary" records and won't be discarded.
- Which document sets include the abnormal records?
We detected the abnormal records in the following datasets.
(1) Yomiuri (Japanese) : 2422 records
(2) CIRB011 (Chinese): 1 record
- [ID list] (its <DOCNO> is "ctc_sto_19980524_0010")
(3) CIRB020 (Chinese): 305 records
(4) Xinhua (English): 1 record
- [ID list] (its <DOCNO> is "XIE19991222.0049")
* Zipped file for the above 4 is available.
- The abnormal records will not be judged by human assessors.
It is possible that the abnormal records with no <TEXT> data are
searched in higher ranks because these records have <HEADLINE> data.
However, we can not determine their degrees of relevance if the <TEXT>
data are not included in the document sets for some reasons (illustration,
picture, table and so on).
Therefore, these records will be excluded from the relevance judgments
by human assessors.
- How to treat these records ?
You can choose a method you like from (1) to (4).
(1) DOC LIST
A method of removing these records is to simply delete document IDs of
these records from your document list of search output.
(2) INDEXING
Another method of removing these records is to remove these records from
the document sets and to execute indexing process again.
(3) OTHERS
You can select any other methods for removing the abnormal records from
your search results.
(4) NONE
You can do nothing.
* Please put top 1000 documents in each file of your search results regardless
you remove the abnormal records or not.
- If the abnormal records are included in your search result file, they won't
be judged by human assessors and automatically treated as "irrelevant."
We will accept your search result file even if the abnormal records are
included into it. But, these records are not going to be judged by human
assessors, and to be treated automatically as irrelevant, as mentioned
above.
- Please describe your method for removing the abnormal records in your system
description.
The <COMMENT> field is prepared in the template of system description.
Could you please write your method for removing the abnormal records in
the <COMMENT> filed? For example,
(a)"Removing abnormal records: DOC LIST"
(b)"Removing abnormal records: INDEXING"
(c)"Removing abnormal records: OTHERS" (please also describe
the method for removing)
(d)"Removing abnormal records: NONE"
- Postponement of submission deadline
The submission deadline is extended to 2003-11-08 (Sat.). That is, an extra
week is provided to removing the abnormal records.
* This is based on the following email "subject: [ntcadm-clir : 1138]
Abnormal Records : a detailed report".
Thanks to Nakagawa-san and Sakai-san for their corporation and useful suggestions
on this matter!
[NTCIR-4 CLIR CFP]