NTCIR Project
NTCIR-12 MedNLPDoc
Research Purpose Use of Test Collection

[JAPANESE] [NTCIR Home] [NTCIR Data Home]

NTCIR-12 MedNLP (Medical Natural Language Processing for Clinical Document)

　

This corpus is electric health records that is provided from a medical shared task “MedNLPDoc”. This corpus is suitable to build a pilot version of your medical applications or systems.

Collection	Corpus
Collection	genre	File Name	Language	# of documents	Volume
MedNLPDOC	electric health records	MedNLPDoc_TRAIN_v5.xml	J	200 documents	169KB
MedNLPDOC	electric health records	MedNLPDoc_TEST_v5.xml	J	78 documents	135KB

MedNLPDoc_TRAIN_v5.xml

This corpus is a collection of electric health records (EHR) provided by “MedNLPDOC,” the Phenotyping task. This corpus is suitable for building a pilot version of your original medical applications and/or systems.

More medical records are written in electronic format, now replacing paper, and that leads to a higher importance of information processing technique in the fields of medicine. This EHR-corpus is annotated with disease names, date and time expressions, and factuality. This corpus enables to evaluate fundamental techniques in medical fields, such as information extraction.

Sample

<data id="68" sex="m" age="49">
<text>
２００４年１２月２～１６日，前回入院．
今回２回目の入院．
前回他院にてアメーバ肝膿瘍の手術予定であったが，術前の検査でＨＩＶ陽性であったため当院入院．
ＳＴＳ陽性．
今回上記の疾患について外来フォロー中であったが吐血で入院．
食道潰瘍が判明．
</text>

The following is the procedure to obtain the test collection. The test collection and data are available from NII free of charge.

The application form of the test collection must be filled out and sent by E-mail to ntc-secretariat.
The user agreement (memorandum Permission to Use Test Collection) is required.

The user agreement form must be filled out and sent by postal mail or courier to the address below.
Please download and make two copies in double-sided print.
Signatures are needed on both agreement forms.
After being counter-signed by the NII side, one copy of the form will be sent to you and one copy will be kept by NII.

Documents to submit

Application Form
Memorandum Permission to Use NTCIR-12 MedNLP Test Collection (sent by email)

Reference

NTCIR-12 MedNLPDoc Website
Task Overview of NTCIR-12 MedNLPDoc
Overview of the NTCIR-12 MedNLPDoc Task
NTCIR-12 Online Proceedings: MedNLPDoc
NTCIR-12 MedNLP Evaluation Results

Address

NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo
102-8430, JAPAN

PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat

Contact us : ntc-secretariat

Notice

The test collection was constructed and used for the NTCIR project. It is usable only for research purposes.
The document collection included in the test collection was made available to NII for use in the NTCIR project free of charge or for a fee. The providers of the document data understand the importance of such test collections in research on information access technologies and have kindly given their permission to use the data for research purposes. Please remember that the document data in the NTCIR test collection is copyrighted and has commercial value as data. To maintain a good relationship with the data producers/provider, we researchers must be reliable partners and use the data only for research purposes under the user agreement, and we must use the data carefully so as not to violate copyright.

[JAPANESE] [NTCIR Home] [Top of this page] [NTCIR Data Home]
Updated on : 2016-09-20
ntc-admin

NTCIR Project NTCIR-12 MedNLPDoc Research Purpose Use of Test Collection

NTCIR-12 MedNLP (Medical Natural Language Processing for Clinical Document)

MedNLPDoc_TRAIN_v5.xml

Sample

NTCIR Project
NTCIR-12 MedNLPDoc
Research Purpose Use of Test Collection