NTCIR Project
Research Purpose Use of Test Collection


NTCIR-11 MedNLP (Medical Natural Language Processing "MedNLP-2")


This corpus is electric health records that is provided from a medical shared task “MedNLP-2”. This corpus is suitable to build a pilot version of your medical applications or systems.

Collection  Corpus     


File Name Language # of documents Volume 


electric health records mednlp2sub.xml 82 documents  172KB


This corpus is a collection of electric health records (EHR) provided by “MedNLP-2,” the medical shared task. This corpus is suitable for building a pilot version of your original medical applications and/or systems.

More medical records are written in electronic format, now replacing paper, and that leads to a higher importance of information processing technique in the fields of medicine. This EHR-corpus is annotated with disease names, date and time expressions, and factuality. This corpus enables to evaluate fundamental techniques in medical fields, such as information extraction.


<t>2025年月8月2日(来院5日前)頃</t>から <c icd="R104">腹痛</c>が生じるとともに,
<cicd="R630">食欲不振</c>, <c icd="R11_">嘔気</c> ・ <c icd="R11_">嘔吐</c>出現した。
体幹は温かく、<c icd="R579">ショック状態</c>。
明らかな <c icd="G839" modality="negation">運動麻痺</c> はみられず。
翌日, <c icd="R402">意識障害</c>出現し, <c icd="N289">腎機能障害</c>の増悪を認め,
<t>8月9日18時10分</t>に <c icd="I469">心肺停止</c>。
<t>8月9日21時44分</t> <c icd="R99_">死亡確認</c>。


The following is the procedure to obtain the test collection. The test collection and data are available from NII free of charge.
Documents to submit Reference


NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo
102-8430, JAPAN

PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat
Contact us : ntc-secretariat


The test collection was constructed and used for the NTCIR project. It is usable only for research purposes.
The document collection included in the test collection was made available to NII for use in the NTCIR project free of charge or for a fee. The providers of the document data understand the importance of such test collections in research on information access technologies and have kindly given their permission to use the data for research purposes. Please remember that the document data in the NTCIR test collection is copyrighted and has commercial value as data. To maintain a good relationship with the data producers/provider, we researchers must be reliable partners and use the data only for research purposes under the user agreement, and we must use the data carefully so as not to violate copyright.