NACSIS Academic Conference Paper Database (NTCIR-1 & 2)

Document records are extracted from "NACSIS Academic Conference Paper Database" .

The documents in the Collection are author abstracts of the papers presented at academic meetings hosted by 65 Japanese academic conferences.
Since one of the purposes of the original database is to provide an alert information about papers presented in Japanese academic conferences, documents are put in the database without any revision nor modification by professional abstractors or editors. Some of them are refereed, and others are pre- or non-refereed.
Also as we had put the "Error Notice", there are "errors", such as typographical errors, in the data. These range from the errors in the original data or other typographical errors, to errors in the reformatting done at NII or at the Test Collection Projects.
Both as part of the philosophy of leaving the data as close to the original as possible, and because it is impossible to check all the data manually, our error-checking has concentrated on allowing readability of the data rather than on correcting content.

Records are SGML tagged plain text. EUC is used as a character code set for Japanese text.

A record may contain ACCN, title, a list of author(s), name of the Conference, date of the Conference, author abstract, keyword(s), and name of the hosted society. ACCN and title are mandatory.

A Sample of Japanese document in J Collection [html] [plain text]
A Sample of English document in E Collection [html] [plain text]
A Sample of Japanese document with segmentation [html] [plain text]
List of tags in a Japanese document record

<REC> beginning tag of a document record
<ACCN> document ID (Accession Number); it is a combination of 'gakkai-e-' and 10 figures. </ACCN>
<TITE TYPE="alpha"> title </TITE>
<AUPE TYPE="alpha"> list of author(s); when a document contains more than one author, each author is divided by ' / ' (single-byte space + single-byte slash + single-byte space). </AUPE>
<CNFE TYPE="alpha"> name of the conference which paper was presented </CNFE>
<CNFD> conference date; the format if 'yyyy.mm.dd' or 'yyyy.mm.dd - yyyy.mm.dd', where 'yyyy' represents the year, 'mm' represents the month, and 'dd' the day. </CNFD>
<ABSE TYPE="alpha"> author abstract <ABSE.P> and </ABSE.P> for a paragraph in it. </ABSE>
<KYWE TYPE="alpha"> keyword(s); each keyword is divided by ' // ' (single-byte space + 2 single-byte slashes + single-byte space). </KYWE>
<SOCE TYPE="alpha"> name of the hosted society; the list of academic societies that had provided the data is available here. </SOCE>
</REC> end tag of a document record
'TYPE="kanji"'Ўў'TYPE="alpha"' indicate the type of characters used in the field. "kanji" indicates that the 2-byte-EUC code is used in the field. "alpha" indicated that the field contains only single-byte ascii codes.
The keywords are assigned by the author. There is no specific controlled vocabulary for them.
Special Characters & Strings [1]: Entity: The strings which are started with '&' and ended with ';' (for example, >, <, °, Σ, and so on) are often appeared in the document data of NTCIR-1 and 2. They are meaningful strings and not errors. They are called 'entities' and used to represent the special characters or symbols, such as mathematical symbols, symbols for units, Greek characters, Cyril characters, etc. under the SGML/XML environment.
Special Characters & Strings [2]: Characters other than standard character code sets: The special characters created by the authors of the documents and included in the original document records were replaced by the string '*?' in the document data of NTCIR-1 and 2 since such special characters are not included in the standard character code set and can not be used in the system other than original authors'.