NTCIR (NII Test Collection for IR Systems) Project bNTCIRbCONTACT INFORMATIONbNIIb




NTCIR-8 Meeting
Session Notes

DAY-4 June 18 Friday: PATMT and MOAT

|Session6: PATMN|Session7: MOAT|

[Session Notes] Session 6: NTCIR-8 PATENT Translation (PATMT)
[Meeting Program][Online Proceedings]

Date: June 18, 2010
Time: 9:30 - 11:30

1.  Overview of the Patent Translation Task at the NTCIR-8 Workshop
      Atsushi Fujii, Masao Utiyama, Mikio Yamamoto, Takehito Utsuro, Terumasa Ehara, Hiroshi Echizen-ya and Sayori Shimohata

   1) Atsushi Fujii presented an overview of the patent translation task at NTCIR-8 by describing the test collection, methods for evaluating machine translation, and evaluation results for research groups participated in the task.

  • To obtain a parallel corpus, they extracted patent documents for the same or related inventions published in Japan and the United States. The test collection includes approximately 3 200 000 sentence pairs in Japanese and English, which were extracted automatically from our parallel corpus.
  • The test collection also includes search topics for cross-lingual patent retrieval, which can be used to evaluate the contribution of machine translation to retrieving patent documents across languages.
  • In addition, the test collection includes machine translation results and their evaluation scores determined by human experts, which can be used to propose automatic evaluation methods for machine translation. 

  2) Terumasa Ehara introduced the evaluation subtask, which intends to study automatic evaluation of MT systems.

  Question 1: Compare different IR systems on extrinsic evaluation

  Question 2: Ask for more details of the automatic evaluation subtask.

2. MaTrEx: the DCU MT System for NTCIR-8

      Tsuyoshi Okita, Jie Jiang, Rejwanul Haque, Hala Al-Maghout, Jinhua Du, Sudip Naskar and Andy Way (Dublin City University, Ireland)

Tsuyoshi Okita presented the system for participation in the translation subtask at the NTCIR-8 Patent Translation Task. Four techniques were deployed in the system: supertagged PB-SMT, context-informed PB-SMT, noise reduction, and system combination. For EN-JP, the system stood second in terms of BLEU reference score among six participants.

2. Machine Translation for Patent Documents Combining Rule-based Translation and Statistical Post-editing

      Terumasa Ehara (Yamanashi Eiwa College)

Terumasa Ehara described their system architecture, training data preparation and experimental results of the EIWA group in the NTCIR-8 Patent Translation Task. The system combined rule-based machine translation technique and statistical post-editing technique. Experimental results show 0.344 BLEU score for Japanese to English intrinsic evaluation in the Patent Translation Task, which ranked No. 1 among all submissions.

Question 1: any good or bad examples for Statistical Post Editing?

              A: no.

Question 2: influence of RBMT systems

A: compared Toshiba MTsystem and another rule-based system, and the first is better.

3. System Description of NiCT SMT for NTCIR-8

      Keiji Yasuda, Taro Watanabe, Masao Utiyama and Eiichiro Sumita (NICT, Japan)

Keiji Yasuda described their patent translation system which was submitted for the NTCIR-8 patent translation task. Their phrase-based Statistical Machine Translation (SMT) system was trained on a bilingual corpus (3 million sentence pairs) and large size monolingual corpora (460 million sentences for Japanese and 350 million sentences for English). In addition to the normal SMT, they used SVM-based reranker. According to the experimental results, the baseline system gives the high BLEU score. However, the reranker gives negative effects.

by Bin Lu
[Return to top]

Session 6: PATMT

Title: Overview of the Patnet Translation
Presenter: A. Fuji and T. Ehara

In this NTCIR-8, PATMT task includes new subtask as well as previous tasks. Many teams participated in Patent Translation task. Howerver, no team for cross-language IR task and only one team participated in the automatic evaluation task. In NTCIR-8, the size of corpus reaches 3million pairs of sentences. Parallel sentences are extracted from patent document from 1993 to 2005. In the both intrinsic and extrinsic translation task, correlation between ranking based on automatic evaluation and human evaluation is quite high. Automatic Evaluation subtask is newly introduced in this NTCIR-8. This task used NTCIR-7 evaluation result to investigate correlation between system result and human evaluation. The reulst shows that the correlation is about 40%.

Title: MaTrEx: the DCU MT System for NTCIR-8
Presenter: T. Okita

They analyze charateristics of the corpus. In their suppertagging method, they separate lexical category assignment and combinatorial process, incorporate target side for better local reordering. Because long-distance dependency is not captured in this manner
parsing error due to the characteristic (HPSG) to capture soruce-side ambiguity. They also did supertagging on source-side (E-J) with context-informed winthin window size. In order to reduce translation noise they erased mismatched phrases because direct computation of phrase translation probability is problematic. They combined their output from MBR decoder using MBR-CN framework.

Q: How the supertagger is incoporated in the SMT decoder?
A: Using factored model

Q: How many n-gram do you use? why? does it cause data sparseness?

Title: Machine translation for patent documents combining Rule-based Translation and Statistical Post-editing
Presenter: T. Ehara

RBMT is able to help accuracy of SMT. The author run RBMT first and then statistical post editing (SPE) using SMT system. In order to train translation model for SMT system, training corpus is filtered by test corpus. Experiments shows that proposed method outperforms RBMT only. Therefore SPE paraphrases output of RBMT system.

Q: Why don't you use whole corpus for SPE?

Q:Do you find any intersting patterns that SPE recovers RBMT's error? or vice versa?

Q: There were similar works in other language pairs. What is the novelty?

Q: How long does the whole process takes?
A: 1 week for decoding.

Title: System descriptiong of NiCT SMT for NTCIR-8
Presenter: K. Yasuda

The authors used larger monoligual dat for LM training as well as the official data. They also divided the development corpus into 2 pieces for minimum error rate training and reranking modules. This aims to avoid overtuned paraments for both modules. The reranking module utilized SVM using MIRA with searver features from SMT system because sentence level BLEU is not reliable. However, experiments show that proposed method is worse than baseline with monolingual corpus in the both direction in intrisic evaluation. The also conducted post evaluation experiment, fixing baseline, varying the number of n-best, and training train reranker using test corpus as oracle. The oracle shows much higher performance (about 45 BLEU), which is very promising.

Q: What do we expect from divided develompment corpus instead of the whole?
A: speed up especially at tuning stage.

Q: Did you use topic dependent model as you had done at NTCIR-7?
A: No.

Q: Why reranking is not effecive in patent domain?
A: due to different chracteristic.

Q: How long does it takes?
A: 3 days for decoding whole test corpus.

Overall question, comments:
Q: how about try pseudo gold-standard to predict the qualiyty of translation system?
A: it is a worth way, even though if there exist many SMT systems, SMT benefits from it.

Q: how about divide constrained and unconstrained (i.e. do not use external resources) task?
A: considerable.

by Hwidong Na
[Return to top]

[Session Notes] Session 7: NTCIR-8 Multilingual Opinion Analysis (MOAT)
[Meeting Program][Online Proceedings]

Date: June 18, 2010

Time: 13:00 - 15:00

Title: Overview of MOAT Task
Presenter: Yohei Seki

This year MOAT introduces a new subtask "cross lingual opinion Q&A". The corpus includes Japanese, Chinese (tranditional and simplified), and English. Charateristics differ from language to language, e.g. Japanese corpus has more opininated expressions that 60% are negative. Annotation argreements are high (over 0.7 kappa value) in English, Japanese and simplified Chinese. The orgarnizers provides online tools for opinion annotation. Automatic evaluation based on agreement of annotators is used for judging performances of participants. The result shows that in higher agreement differences of rank also imply significant differences, and the more opinions in source, the easier the topics are not always true. For future work, several challenges are remaining such as obtaining high quality annoattion with low cost.

Title: Opinion detection by combininig machine learning & linguistic tools
Presenter: J. Savoy

They built a statistical model with features for classification of polarity such as adjectives, verbs, and nouns. They also suggested a new scoring method (z-score) from vocabulary of opininated sentences that measures the degree of opinatedness. By the observations that having a large positive z-score means overuse in a sentence, they adopted the logistic regression method. Failure analysis indicates that for negative opiniated setnece, z-score tends to fail to distinguish whether it is opiniated or not.

Q: How to combine z-score with sentiment resource like SentiWordNet?
A: through logestic regression and additional variable. linear combination is possible.

Title: PKUTM Experiments in NTCIR-8 MOAT Task
Presenter: T. Ma

For opiniated task, their process consists of three parts. Preprocessing translate traditional Chines to simplified Chinese. Feature selection extracts various information and classification using SVM gives the best result for this task, which means the task is linearly separable. For holder/target task, chunking and heuristic rules are used because they give more accurate and easier way to control information than (shallow) parsing.

Q: How to translate Traditional Chinese to simplifed Chinese?
A: by character conversion (1-to-1)

Q: Your heuristics can work in Traditional Chinese as well? Did you try?
A: no

Title: Supervised approaches and depenedency parsing for Chinese opinion analysis
Presenter: B. Lu

For subjectivity and polarity classification, they specify reproting verb, sentiment-baring words, some adverbs, etc. to combine sentiment lexicon resources with MaxEnt and SVM. As a combination method, majority voting is used to select the best result from three ourput from supervised system, SVM and MaxEnt. The result shows that combination outperforms each standalone system. For holder/target identification, they define a heuritical processing using dependency parsing. They achieves high accuracy at holder identification, but find about 50% of correct target.

Q: Which module contributs the most in combination?
A: in isolating experiments, MaxEnt underperforms SVM.

Title: The OpAL system at NTCIR-8 MOAT
Presenter: E. Boldrini

They desigend to support temporal expression and anphora resolution in this task. In monolingual task in English, they define rules to classify polarity and train SVM over NTCIR-8, MPQA, and EmotiBlog with POS information. In English-Chinese cross-lingual QA task, they used web translation to obtain Chinese sentement lexicon.

Q: How to use temporal expression?

by Hwidong Na
[Return to top]

Session 7: MOAT

Date: June 18, 2010

Time: 13:00-13:40
Speaker: Yohei Seki

Title: Overview of the Patent Mining Task at the NTCIR-8 Workshop

The speaker discusses the goal, task description, evaluation results, and the participant  approaches for the 3rd multilingual opinion analysis task (MOAT) in the NTCIR-8 workshop. They explore their task from past experiences towards cross-lingual opinion analysis application. In order to solve this challenge problem, they believe that two solutions are required: (1) languages-transfer approaches with semi-supervised techniques and (2) cross-lingual opinion question answering capability. To get closer to this goal, they create an opinion annotation corpora based on opinion Q&A in a common format across languages. Many teams participated in the subtasks for more than two languages sides and some teams also participate in cross-lingual subtask. There are 56 result runs submissions from 16 participants, and half of the participants submitted the results in more than two languages related tasks. They hope that the MOAT in NTCIR-8 will be a milestone for cross-lingual opinion analysis researches. In this task, many participants proposed and challenged a new opinion extraction technology. The smart feature filter and machine learning technologies with rich lexicon should be successful keys for opinion analysis.

Time: 13:40-14:00
Speaker: Olena Zubaryeva
Title: Opinion Detection by Combining Machine Learning & Linguistic Tools

The speaker presents their work in the Multilingual Opinion Analysis Task (MOAT) done during the NTCIR-8 evaluation campaign. They propose a probabilistic model derived from Muller's method that allows them to determine and weight terms (isolated words, bigram of words, noun phrases, etc.) belonging to a given category (or subset of the corpus) compared to the rest of the corpus. Based on these terms and their weights, they adopt the logistic regression method in order to define the most probable category for each input sentence. They participation is strongly motivated by the objective to suggest an approach on the polarity subtask of the MOAT with a minimal linguistic component with a possibility to have its performance improved by natural language specific tools. Thus, for the English language, they adopt a combination of both machine learning approach (Z score and logistic regression) and a polarity dictionary (linguistic component). For the traditional Chinese and Japanese languages however, they current system is limited to a machine learning scheme.

Time: 14:00-14:20
Speaker: Tengfei Ma
Title: PKUTM Experiments in NTCIR-8 MOAT Task

The speaker describes their work in the Simplified Chinese opinion analysis tasks in NTCIR-8. In the task of detecting opinioned sentences, various sentiment lexicons are used, including opinion indicators, opinion operators, degree adverbs and opinion words. The linear SVM model is selected as the main classifier, and four groups of features are extracted according to punctuations, words and sentiment lexicons. They also try a two-step classification to improve the SVM result. For extracting the opinion holder and target, they use a synthesis of CRF and heuristic rules. The evaluation results on NTCIR-8 MOAT Simplified Chinese side show that their system achieves the best F-measure in two tasks. This demonstrates that the proposed framework is promising.

Time: 14:20-14:40
Speaker: Bin LU
Title: Supervised Approaches and Dependency Parsing for Chinese Opinion Analysis at NTCIR-8

The speaker describes their participating system, which is based on supervised approaches and dependency parsing, for opinion analysis on traditional Chinese texts at NTCIR-8. For opinionated sentence recognition, the supervised lexicon-based approach, SVM and Maximum Entropy are combined together. For polarity classification, they use only the supervised lexicon-based approach. For opinion holder and target identification, they, on the basis of dependency parsing, identify opinion holders by means of reporting verbs and identify opinion targets by considering both opinion holders and opinion-bearing words. The results show that among all the teams participating in the traditional Chinese task, their system achieve: 1) the highest F-measure on the opinionated sentence recognition task, 2) the second highest F-measure on the identification of both opinion holders and targets, 3) the middle ranking for opinion polarity classification.

Time: 14:40-15:00
Speaker: Alexandra Balahur
Title: The OpAL System at NTCIR 8 MOAT

The speaker first introduces the background of the research.
The present is marked by the availability of large volumes of heterogeneous data, whose management is extremely complex. While the treatment of factual data has been widely studied, the processing of subjective information still poses important challenges. This is especially true in tasks that combine Opinion Analysis with other challenges, such as the ones related to Question Answering. Then the speaker describes the different approaches they employed in the NTCIR 8 MOAT monolingual English (opinionatedness, relevance, answerness and polarity) and cross-lingual English-Chinese tasks, implemented in their OpAL system. The results obtained when using different settings of the system, as well as the error analysis performed after the competition, offered some clear insights on the best combination of techniques,  that balance between precision and recall. Contrary to their initial intuitions, they have also seen that the inclusion of specialized Natural Language Processing tools dealing with Temporality or Anaphora Resolution lowers the system performance, while the use of topic detection techniques using faceted search with Wikipedia and Latent Semantic Analysis leads to satisfactory system performance, both for the monolingual setting, as well as in a multilingual one.

by Jian Zhang
[Return to top]

Last updated: July 09, 2010