NACSIS Grant-in-Aid Scientific Research Database (NTCIR-2)

Last modified: Aug. 14, 2000

Document records are extracted from "NACSIS Grant-in-Aid Scientific Research Database" .

The Ministry of Education, Science, Sports and Culture promotes scientific research in Japan through subsidies to researchers. Each researcher must report the results of subsidized research to the Ministry. These extended summaries are recorded in this database.
Also as we had put the "Error Notice", there are "errors", such as typographical errors, in the data. These range from the errors in the original data or other typographical errors, to errors in the reformatting done at NII or at the Test Collection Projects.
Both as part of the philosophy of leaving the data as close to the original as possible, and because it is impossible to check all the data manually, our error-checking has concentrated on allowing readability of the data rather than on correcting content.

Records are SGML tagged plain text. EUC is used as a character code set for Japanese text.

A record may contain ACCN, year (in which the record of summary was reported), subject code, title (project name), extended summary, caption(s), and keyword(s). ACCN and title are mandatory.

A Sample of Japanese document in J Collection [html] [plain text]
A Sample of English document in E Collection [html] [plain text]
A Sample of Japanese document with segmentation [html] [plain text]
List of tags in a Japanese document record

<REC> beginning tag of a document record
<ACCN> document ID (Accession Number); it is a combination of 'kaken-e-' and 10 figures. </ACCN>
<YEAR> fiscal year in which the document was repotred; it is a number of 4 figures. </YEAR>
<SBE1> subject code; the one in 1985 to 1992 is enclosed by '<SBE1>' and '</SBE1>, and the one in 1993 to 1999 is enclosed by '<SBE2>' and '</SBE2>; it is a combination of 3 figures and subject name. </SBE1>
<PJNE TYPE="alpha"> title </PJNE>
<ABSE TYPE="alpha"> extended summary <ABSE.P> and </ABSE.P> for a paragraph in it. </ABSE>
<CAPE TYPE="alpha"> caption; if there are more than a caption, each of them is enclosed by tags separately. </CAPE>
<KYWE TYPE="alpha"> keyword(s); each keyword is divided by ' / ' (single-byte space + single-byte slash + single-byte space). </KYWE>
</REC> end tag of a document record
'<PJNM TYPE="kanji">' and '<PJNE TYPE="alpha">' are equivalent to '<TITL TYPE="kanji">' and '<TITE TYPE="alpha">' in NACSIS Academinc Conference Paper Database respectively.
'TYPE="kanji"','TYPE="alpha"' indicate the type of characters used in the field. "kanji" indicates that the 2-byte-EUC code is used in the field. "alpha" indicated that the field contains only single-byte ascii codes.
The keywords are assigned by the author. There is no specific controlled vocabulary for them.
Special Characters & Strings [1]: Entity: The strings which are started with '&' and ended with ';' (for example, >, <, °, Σ, and so on) are often appeared in the document data of NTCIR-1 and 2. They are meaningful strings and not errors. They are called 'entities' and used to represent the special characters or symbols, such as mathematical symbols, symbols for units, Greek characters, Cyril characters, etc. under the SGML/XML environment.
Special Characters & Strings [2]: Characters other than standard character code sets: The special characters created by the authors of the documents and included in the original document records were replaced by the string '*?' in the document data of NTCIR-1 and 2 since such special characters are not included in the standard character code set and can not be used in the system other than original authors'.