NACSIS Academic Conference Paper Database
(NTCIR-1 & 2)
Document records are extracted from
"NACSIS Academic Conference Paper Database"
.
-
The documents in the Collection are author abstracts of the papers
presented at academic meetings hosted by 65 Japanese academic
conferences.
-
Since one of the purposes of the original database is
to provide an alert information about papers presented in Japanese
academic conferences, documents are put in the database without any
revision nor modification by professional abstractors or editors.
Some of them are refereed, and others are pre- or non-refereed.
-
Also as we had put the
"Error Notice", there are "errors", such as
typographical errors, in the data. These range from the errors in the
original data or other typographical errors, to errors in the reformatting
done at NII or at the Test Collection Projects.
-
Both as part of the philosophy of leaving the data as close to the
original as possible, and because it is impossible to check all the
data manually, our error-checking has concentrated on allowing
readability of the data rather than on correcting content.
Records are SGML tagged plain text. EUC is used as a character code set
for Japanese text.
A record may contain ACCN, title, a list of author(s), name of the Conference,
date of the Conference, author abstract, keyword(s), and name of the hosted society.
ACCN and title are mandatory.
-
A Sample of Japanese document in J Collection
[html]
[plain text]
-
A Sample of English document in E Collection
[html]
[plain text]
-
A Sample of Japanese document with segmentation
[html]
[plain text]
-
List of tags in a Japanese document record
<REC> beginning tag of a document record
<ACCN> document ID (Accession Number); it is a combination of 'gakkai-e-' and 10 figures. </ACCN>
<TITE TYPE="alpha"> title </TITE>
<AUPE TYPE="alpha"> list of author(s); when a document contains more than one author, each author is divided by ' / '
(single-byte space + single-byte slash + single-byte space).
</AUPE>
<CNFE TYPE="alpha"> name of the conference which paper was presented </CNFE>
<CNFD> conference date; the format if 'yyyy.mm.dd' or 'yyyy.mm.dd - yyyy.mm.dd', where 'yyyy' represents
the year, 'mm' represents the month, and 'dd' the day.
</CNFD>
<ABSE TYPE="alpha">
author abstract <ABSE.P> and </ABSE.P> for a paragraph in it.
</ABSE>
<KYWE TYPE="alpha">
keyword(s); each keyword is divided by ' // '
(single-byte space + 2 single-byte slashes + single-byte space).
</KYWE>
<SOCE TYPE="alpha">
name of the hosted society;
the list of academic societies that had provided the data
is available
here.
</SOCE>
</REC> end tag of a document record
-
'TYPE="kanji"'กข'TYPE="alpha"' indicate the type of characters used
in the field. "kanji" indicates that the 2-byte-EUC code is used in the field.
"alpha" indicated that the field contains only single-byte ascii codes.
-
The keywords are assigned by the author. There is no
specific controlled vocabulary for them.
-
Special Characters & Strings [1]: Entity:
The strings which are started with '&' and ended with ';'
(for example, >, <, °, Σ, and so on)
are often appeared in the document data of NTCIR-1 and 2. They are meaningful
strings and not errors. They are called 'entities' and used to
represent the special characters or symbols, such as mathematical
symbols, symbols for units, Greek characters, Cyril characters,
etc. under the SGML/XML environment.
-
Special Characters & Strings [2]: Characters other than
standard character code sets:
The special characters created by the authors of the documents
and included in the original document records were replaced by
the string '*?' in the document data of NTCIR-1 and 2 since such
special characters are not included in the standard character
code set and can not be used in the system other than original
authors'.