NACSIS Grant-in-Aid Scientific Research Database
(NTCIR-2)
Last modified: Aug. 14, 2000
Document records are extracted from
"NACSIS Grant-in-Aid Scientific Research Database"
.
-
The Ministry of Education, Science, Sports and Culture promotes
scientific research in Japan through subsidies to researchers. Each researcher
must report the results of subsidized research to the Ministry.
These extended summaries are recorded in this database.
-
Also as we had put the
"Error Notice", there are "errors", such as
typographical errors, in the data. These range from the errors in the
original data or other typographical errors, to errors in the reformatting
done at NII or at the Test Collection Projects.
-
Both as part of the philosophy of leaving the data as close to the
original as possible, and because it is impossible to check all the
data manually, our error-checking has concentrated on allowing
readability of the data rather than on correcting content.
Records are SGML tagged plain text. EUC is used as a character code set
for Japanese text.
A record may contain ACCN, year (in which the record of summary was reported),
subject code, title (project name), extended summary,
caption(s), and keyword(s).
ACCN and title are mandatory.
-
A Sample of Japanese document in J Collection
[html]
[plain text]
-
A Sample of English document in E Collection
[html]
[plain text]
-
A Sample of Japanese document with segmentation
[html]
[plain text]
-
List of tags in a Japanese document record
<REC> beginning tag of a document record
<ACCN> document ID (Accession Number); it is a combination of 'kaken-e-' and 10 figures. </ACCN>
<YEAR> fiscal year in which the document was repotred; it is a number of 4 figures. </YEAR>
<SBE1> subject code; the one in 1985 to 1992 is enclosed by '<SBE1>'
and '</SBE1>, and the one in 1993 to 1999 is enclosed by '<SBE2>'
and '</SBE2>; it is a combination of 3 figures and subject name. </SBE1>
<PJNE TYPE="alpha"> title </PJNE>
<ABSE TYPE="alpha">
extended summary <ABSE.P> and </ABSE.P> for a paragraph in it.
</ABSE>
<CAPE TYPE="alpha"> caption; if there are more than a caption,
each of them is enclosed by tags separately. </CAPE>
<KYWE TYPE="alpha">
keyword(s); each keyword is divided by ' / '
(single-byte space + single-byte slash + single-byte space).
</KYWE>
</REC> end tag of a document record
-
'<PJNM TYPE="kanji">' and '<PJNE TYPE="alpha">' are equivalent to
'<TITL TYPE="kanji">' and '<TITE TYPE="alpha">' in NACSIS Academinc Conference
Paper Database respectively.
-
'TYPE="kanji"','TYPE="alpha"' indicate the type of characters used
in the field. "kanji" indicates that the 2-byte-EUC code is used in the field.
"alpha" indicated that the field contains only single-byte ascii codes.
-
The keywords are assigned by the author. There is no
specific controlled vocabulary for them.
-
Special Characters & Strings [1]: Entity:
The strings which are started with '&' and ended with ';'
(for example, >, <, °, Σ, and so on)
are often appeared in the document data of NTCIR-1 and 2. They are meaningful
strings and not errors. They are called 'entities' and used to
represent the special characters or symbols, such as mathematical
symbols, symbols for units, Greek characters, Cyril characters,
etc. under the SGML/XML environment.
-
Special Characters & Strings [2]: Characters other than
standard character code sets:
The special characters created by the authors of the documents
and included in the original document records were replaced by
the string '*?' in the document data of NTCIR-1 and 2 since such
special characters are not included in the standard character
code set and can not be used in the system other than original
authors'.