----------------------------------------------------------------
                 Evaluation Results of the formal Run
                         Translation Subtask
                   (Automatic Intrinsic Evaluation)
                    NTCIR-8 Patent Translation Task
                              2010.1.25


A. Intrinsic automatic evaluation
    We measured BLEU of submitted files for the intrinsic automatic
evaluation using single reference.

A-1. Evaluation procedures
    The BLEU values are computed as the following procedures.
The procedures for Japanese to English and English to Japanese
translation slightly differ by tokenization process.

[Evaluation procedure for JE (Japanese to English) translation]
(1) tokenizing all sentences in submitted and reference files
    by the tokenizer used at "ACL2007 2nd workshop on SMT"
    which is put on the following site.

         http://www.statmt.org/wmt07/baseline.html

    We didn't lowercase any file. So, BLEU computation is case sensitive.

(2) computing BLEU values for the test-set of the formal run
     (1251 sentences) with single reference and 95% confidence
     intervals for each submitted file.  We used 'Bleu Kit'
     version 1.0 (written by Mr. Norimatsu) for computing values.

         http://www.nlp.mibel.cs.tsukuba.ac.jp/bleu_kit/

[Evaluation procedure for EJ (English to Japanese) translation]
(1) Removing all white spaces (single-byte) in submitted and reference files.
(2) translating single-byte alphabets, numbers, and special symbols
     into multibyte characters for standardization purposes.
(3) tokenizing all Japanese sentences by Chasen 2.4.2 with ipadic
     2.7.0 dictionary in utf-8.  We set .chasenrc to concatenate
     a sequence of numbers or alphabets into a word.
(4) computing BLEU values with single reference (1119 sentences) and
     95% confidence intervals for each submitted file
     by the same tool for the above JE evaluation.


A-2. Results
    All information provided in submitted files and results are
compiled in an attached Excel file.  In the Excel file, the columns
labeled with 'BLEU-*', 'BLEU-*-LOW' and 'BLEU-*-HIGH' show
the BLEU values, the lower and upper values of 95% confidence
intervals for the test-set sentences, respectively.
    The line labeled with 'Moses' shows the result of Moses SMT system.
The configuration of Moses (2008-02-20 version) we used are the  
followings.

    * Training data were PSD-1 at NTCIR-7 and additional data at NTCIR-8.
      Sentences were preprocessed for simple normalization.
    * Training scripts and programs with nearly default options.
    * Models
      - Phrase table: about 127M phrase pairs.
      - Language model: 5gram and Interpolated modified KN smoothing(SRILM).
      - Reordering model: msd-bidirectional-fe
    * MERT using the development data (pat-dev-2006-2007.txt) in NTCIR-8 data
      (2000 sentences).
    * Decoding with 200 beam-width and no distortion-limit.


B. File

    ntc8patmt-MT-fmlrun-intrinsic-result.xls

      The evaluation result for the intrinsic evaluation.
      This file consists of group IDs, system descriptions,
      time for training and decoding, and evaluation results(BLEU).

    ntc8patmt-MT-fmlrun-intrinsic-readme.txt

      This file.

----------------------------------------------------------------
NTCIR-8 Patent Translation Task Organizers
ntcadm-patmt@cl.cs.titech.ac.jp