"NTCIR Workshop Evaluation Agreement": http://www.ldc.upenn.edu/Membership/Agreements/eval/NTCIR.html
2. HOW TO OBTAIN THE DATA
- (1) Register to participate in the CLIR task at NTCIR-5
- The LDC will grant the license to the registered participants.
- (2) Download the LDC's " NTCIR Workshop Evaluation Agreement" from the above URL
- (3) Complete and sign the agreement.
- (4) Fax a signed agreement to the Linguistic Data Consortium (LDC).
- Only "one signed form by fax" is enough for the LDC.
- (5) The document data will be provided to you by the LDC via their intranet server for download.
- Linguistic Data Consortium
- 3600 Market Street
- Suite 810
- Philadelphia, PA, 19104-2653, USA
- Pone:+1 (215) 898-0464
- Fax:+1(215)573-2175
- Email: ldc@ldc.upenn.edu
- ATTN: Ms Llya Ahtaridis, Membership Coordinator
3. SCOPE OF THE LICENSE
The license for evaluation use as the NTCIR Workshop participants will
be valid until September 30, 2006. If you would like to keep the data after
the licensed will be expired, at which time user agrees to delete the corpora
from any computer or media onto which it has been copied. User may keep
the data by agreeing to pay the LDC the non-member fee and signing
the generic LDC nonmember user agreement.
For the detailed conditions, please consult the agreement on the above
URL.
4. IF YOU ALREADY HAVE THE "LDC 2003T05 English Gigawords collection" ...
If you already have the "LDC 2003T05 English Gigawords collection",
please use the document in the Corpus.
The files "xieYYYYMM.gz"s under the directory of "xie" are
used for the CLIR task at the 5th NTCIR Workshop.
YYYY is each of 1998, 1999, 2000 and 2001. MM is each month in the years.
For example, a file is "xie199801.gz".
In total, the 47 files from xie199801.gz to xie200111.gz will be used for
NTCIR-5. (no xie200112.gz in the DVD)
5. CONVERSION OF LDC DOCUMENT DATA INTO NTCIR FORMAT
Either you will use the document data from LDC 2003TO05 DVD or the document data
provided for the NTCIR evaluation license, please gunzip the document files and
the apply the script xie2ntc2.pl_txt
[readme] to convert the record format into the NTCIR's standard format.
For the document files used for training and test,please consult
URL: @http://homepage3.nifty.com/kz_401/Xinhua_Info.htm