TITLE=README-e.txt DATE=2001-03-01 NII-NACSIS Test Collection for Information Retrieval Systems 2 (NTCIR-2) README 1 File Description This CD-ROM contains the following files: readme-j.pdf - Japanese version of this file (PDF, Font Embedded) readme-j.txt - Japanese version of this file (EUC) readme-e.txt - This file agreem-j.pdf - Memorandum on the Permission to Use Test Collection 2 (Japanese) agreem-e.pdf - Memorandum on the Permission to Use Test Collection 2 (English) manual-j.pdf - User Manual (Japanese) manual-e.pdf - User Manual (English) j-docs.tgz - Japanese document set (J Collection) e-docs.tgz - English document set (E Collection) topics.tgz - topics(Japanese, English”Ė rels.tgz - Relevance judgments scripts.tgz - scripts for transformation of ACCN's of the documents - Adobe Acrobat Reader is needed to read *.pdf files. - *.tgz files are tared, then gzipped. Please use "gzip -dc | tar xvf -" to extract the original data on a UNIX system. Japanese data is in EUC code. The files included in *.tgz files, and their original file sizes, are given below. For the file size, 1 MB = 1024 * 1024 bytes are used. (1) j-docs.tgz Japanese document set (J Collection) The following files are included under the directory "j-docs/": ntc2-j1g (118.2MB) - Document set (J Collection) Japanese documents, which were extracted from "NACSIS Academic Conference Paper Database". ntc2-j1k (481.3MB) - Document set (J Collection) Japanese documents, which were extracted from "NACSIS Grant-in-Aid Scientific Research Database". (2) e-docs.tgz English document set (E Collection) The following files are included under the directory "e-docs/": ntc2-e1g (89.7MB) - Document set (E Collection) English documents, which were extracted from "NACSIS Academic Conference Paper Database". ntc2-e1k (110.7MB) - Document set (E Collection) English documents, which were extracted from "NACSIS Grant-in-Aid Scientific Research Database" . (3) topics.tgz Search topics (Japanese and English) The following files are included under the directory "topics/": topic-j0101-0149 - Japanese topics used as the test topics at the 2nd NTCIR Workshop topic-e0101-0149 - English topics used as the test topics at the 2nd NTCIR Workshop (4) rels.tgz Relevance judgments The following files are included under the directory "rels/": - Relevance judgments for J Collection rel1_ntc2-j2_0101-0149 - Relevance judgments of topics 0101-0149 against ntc1-j1.mod, ntc2-j1g and ntc2-j1k (Relevant File (Level 1); S- and A-judgments are treated as "relevant") rel2_ntc2-j2_0101-0149 - Relevance judgments of topics 0101-0149 against ntc1-j1.mod, ntc2-j1g and ntc2-j1k (Partially Relevant File (Level 2); S-, A- and B-judgments are treated as "relevant") rel*_ntc2-j2_0101-0149 - rel*_ntc2-j2_0101-0149 without comments - Relevance judgments for E Collection rel1_ntc2-e2_0101-0149 - Relevance judgments of topics 0101-0149 against ntc1-e1.mod, ntc2-e1g and ntc2-e1k (Relevant File (Level 1); S- and A-judgments are treated as "relevant") rel2_ntc2-e2_0101-0149 - Relevance judgments of topics 0101-0149 against ntc1-e1.mod, ntc2-e1g and ntc2-e1k (Partially Relevant File (Level 2); S-, A- and B-judgments are treated as "relevant") rel*_ntc2-e2_0101-0149 - rel*_ntc2-e2_0101-0149 without comments - Relevance judgments for J and E Collections rel1_ntc2-je2_0101-0149 - Relevance judgments of topics 0101-0149 against ntc1-j1.mod, ntc1-e1.mod, ntc2-j1g, ntc2-e1g, ntc2-j1k and ntc2-e1k (Relevant File (Level 1); S- and A-judgments are treated as "relevant") rel2_ntc2-je2_0101-0149 - Relevance judgments of topics 0101-0149 against ntc1-j1.mod, ntc1-e1.mod, ntc2-j1g, ntc2-e1g, ntc2-j1k and ntc2-e1k (Partially Relevant File (Level 2); S-, A- and B-judgments are treated as "relevant") rel*_ntc2-je2_0101-0149 - rel*_ntc2-je2_0101-0149 without comments (5) scripts.tgz scripts for transformation of ACCN's of the English documents The following files are included under the directory "scripts/": readme-script-j.txt - README for scripts (Japanese version, EUC) readme-script-e.txt - README for scripts (English) accn-tr.tar - TAR file for scripts ntc1accn - Directory, which includes ACCN conversion scripts for the document sets of NTCIR-1 and README for the scripts 2 Format of the Data and Usage - Plain text files use EUC code. - For the format of each file and its usage, please consult the NTCIR-2 manual (manual-e.pdf or manual-j.pdf). - Relevance judgment files are specified by the combination of retrieval task, the document set used, and topic number. Please use them in the correct combination. For detailed information, please consult Fig.1 below, Section 5.2 and Fig. 5-2 in the Manual. ============================================================================================ TASKS DOCUMENTS[1][2] TOPICS (how many) RELEVANCE JUDGMENTS[3] ============================================================================================ Monolingual Tasks -------------------------------------------------------------------------------------------- J-J Task j-docs/ntc2-j1* topics/topic-j0101-0149 (49) rels/rel*_ntc2-j2_0101-0149 ntc1-j1.mod -------------------------------------------------------------------------------------------- E-E Task e-docs/ntc2-e1* topics/topic-e0101-0149 (49) rels/rel*_ntc2-e2_0101-0149 ntc1-e1.mod ============================================================================================ Cross-Lingual Tasks -------------------------------------------------------------------------------------------- J-E Task j-docs/ntc2-e1* topics/topic-j0101-0149 (49) rels/rel*_ntc2-e2_0101-0149 ntc1-e1.mod -------------------------------------------------------------------------------------------- E-J Task e-docs/ntc2-j1* topics/topic-e0101-0149 (49) rels/rel*_ntc2-j2_0101-0149 ntc1-j1.mod -------------------------------------------------------------------------------------------- J-J,E Task j-docs/ntc2-j1* topics/topic-j0101-0149 (49) rels/rel*_ntc2-je2_0101-0149 e-docs/ntc2-e1* ntc1-j1.mod ntc1-e1.mod -------------------------------------------------------------------------------------------- E-J,E Task j-docs/ntc2-j1* topics/topic-e0101-0149 (49) rels/rel*_ntc2-je2_0101-0149 e-docs/ntc2-e1* ntc1-j1.mod ntc1-e1.mod ============================================================================================ Fig.1 Combination of the documents, topics and relevance judgments [1]The documents file with names ending with 'g' were extracted from the "NACSIS Academic Conference Paper Database", and the files ending with 'k' were extracted from the "NACSIS Grant-in-Aid Scientific Research Database". [2] ntc1-j1.mod and ntc1-e1.mod are the converted files from ntc-j1 and ntc-e1 by using conversion scripts ACCN-j.pl and ACCN-e.pl respectively. [3]The relevance judgments file with names beginning with 'rel1_' are the "relevant" files where S- and A-judgments are treated as "Relevant (1)", and the files with names beginning with 'rel2_' are the "partially relevant" files, in which S-, A- and B- judgments are treated as "Relevant (1)". - The use of Test Collection 2 (NTCIR-2) is permitted under "the Memorandum on the Permission to Use Test Collection 2". 3 Notice about Documents Documents are placed in original databases without any revision or modification by professional abstractors or editors. The documents are author abstracts, and the discourse-level structures of texts may be different from those found in abstracts by professional abstractors. As part of the philosophy of leaving the data as close to the original as possible, and because it is impossible to check all the data manually, there are many "errors" in the data. These range from errors in the original data or other typographical errors, to errors in the reformatting done at NACSIS and by the Test Collection Project Group. The error checking has concentrated on allowing readability of the data rather than on correcting content. This means that there have been automated checks of control characters for correct matching of the beginning and end tags, and for complete ACCN (accession number) fields. The documents in the NTCIR-2 are extracted from the "NACSIS Academic Conference Paper Database" and "NACSIS Grant-in-Aid Scientific Research Database" to be used for the purpose of research on information retrieval and related areas. Therefore, please note that the documents are part of the original database and the coverage is incomplete. As a result, the documents in the NTCIR-2 cannot be used for information purposes. Please understand that neither the organizer of the NTCIR nor NII are responsible for any problems or damage caused by the use of NTCIR-2. 4 Inquiries (1)General inquiries (How to obtain CD-ROM, bibliographic information related to NTCIR data, etc.) should be directed to the NTCIR secretariat. NTCIR Project Office, National Institutes of Informatics Email: ntc-seretariat@nii.ac.jp Postal address: 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, JAPAN Phone: +81-3-4212-2750 Fax: +81-3-3556-1916 (2)Technical inquiries related to the CD-ROM (Data format, How to use NTCIR, etc.) should be directed to the NTCIR admin. NTCIR Project Office, National Institutes of Informatics ATTN: Noriko Kando Email: ntcadm@nii.ac.jp Postal address: 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, JAPAN Phone: +81-3-4212-2529 Fax: +81-3-3556-1916