[Session Notes] Session 6: NTCIR-8 PATENT Translation (PATMT)
[Meeting Program][Online Proceedings]
Date: June 18, 2010
Time: 9:30 - 11:30
1. Overview of the Patent Translation Task at the
NTCIR-8 Workshop
Atsushi Fujii, Masao Utiyama, Mikio Yamamoto, Takehito Utsuro, Terumasa Ehara, Hiroshi Echizen-ya and Sayori Shimohata
1) Atsushi
Fujii presented an overview of the patent translation task at NTCIR-8 by describing
the test collection, methods for evaluating machine translation, and evaluation
results for research groups participated in the task.
- To obtain a parallel corpus, they extracted patent documents for the same
or related inventions published in Japan and the United States. The test
collection includes approximately 3 200 000 sentence pairs in Japanese
and English, which were extracted automatically from our parallel corpus.
- The test
collection also includes search topics for cross-lingual patent retrieval,
which can be used to evaluate the contribution of machine translation to
retrieving patent documents across languages.
- In
addition, the test collection includes machine translation results and
their evaluation scores determined by human experts, which can be used to
propose automatic evaluation methods for machine translation.
2) Terumasa
Ehara introduced the evaluation subtask, which intends to study automatic
evaluation of MT systems.
Question 1: Compare
different IR systems on extrinsic evaluation
Question 2: Ask for more details of the automatic evaluation subtask.
2. MaTrEx: the DCU MT
System for NTCIR-8
Tsuyoshi Okita, Jie
Jiang, Rejwanul Haque, Hala Al-Maghout, Jinhua Du, Sudip Naskar and Andy Way
(Dublin City University, Ireland)
Tsuyoshi Okita presented the system for participation in the
translation subtask at the NTCIR-8 Patent Translation Task. Four techniques were
deployed in the system: supertagged PB-SMT, context-informed PB-SMT, noise
reduction, and system combination. For EN-JP, the system stood second in terms
of BLEU reference score among six participants.
2. Machine Translation for
Patent Documents Combining Rule-based Translation and Statistical Post-editing
Terumasa Ehara (Yamanashi Eiwa College)
Terumasa Ehara described their system architecture, training data
preparation and experimental results of the EIWA group in the NTCIR-8 Patent
Translation Task. The system combined rule-based machine translation technique
and statistical post-editing technique. Experimental results show 0.344 BLEU
score for Japanese to English intrinsic evaluation in the Patent Translation
Task, which ranked No. 1 among all submissions.
Question 1: any good or bad examples for Statistical Post Editing?
A: no.
Question 2: influence of RBMT systems
A: compared Toshiba MTsystem and another rule-based system, and the first
is better.
3. System Description of
NiCT SMT for NTCIR-8
Keiji Yasuda, Taro Watanabe, Masao Utiyama and Eiichiro Sumita (NICT, Japan)
Keiji Yasuda described their patent translation system which was
submitted for the NTCIR-8 patent translation task. Their phrase-based
Statistical Machine Translation (SMT) system was trained on a bilingual corpus
(3 million sentence pairs) and large size monolingual corpora (460 million
sentences for Japanese and 350 million sentences for English). In addition to
the normal SMT, they used SVM-based reranker. According to the experimental
results, the baseline system gives the high BLEU score. However, the reranker
gives negative effects.
by Bin Lu
[
Return to top]
Session 6: PATMT
Title: Overview of the Patnet Translation
Presenter: A. Fuji and T. Ehara
Summary:
In this NTCIR-8, PATMT task includes new subtask as well as previous tasks.
Many teams participated in Patent Translation task. Howerver, no team for
cross-language IR task and only one team participated in the automatic
evaluation task. In NTCIR-8, the size of corpus reaches 3million pairs
of sentences. Parallel sentences are extracted from patent document from
1993 to 2005. In the both intrinsic and extrinsic translation task, correlation
between ranking based on automatic evaluation and human evaluation is quite
high. Automatic Evaluation subtask is newly introduced in this NTCIR-8.
This task used NTCIR-7 evaluation result to investigate correlation between
system result and human evaluation. The reulst shows that the correlation
is about 40%.
Title: MaTrEx: the DCU MT System for NTCIR-8
Presenter: T. Okita
Summary:
They analyze charateristics of the corpus. In their suppertagging method,
they separate lexical category assignment and combinatorial process, incorporate
target side for better local reordering. Because long-distance dependency
is not captured in this manner
parsing error due to the characteristic (HPSG) to capture soruce-side ambiguity. They also did supertagging on source-side (E-J) with context-informed winthin window size. In order to reduce translation noise they erased mismatched phrases because direct computation of phrase translation probability is problematic. They combined their output from MBR decoder using MBR-CN framework.
Q: How the supertagger is incoporated in the SMT decoder?
A: Using factored model
Q: How many n-gram do you use? why? does it cause data sparseness?
A:
Title: Machine translation for patent documents combining Rule-based Translation and Statistical Post-editing
Presenter: T. Ehara
Summary:
RBMT is able to help accuracy of SMT. The author run RBMT first and then
statistical post editing (SPE) using SMT system. In order to train translation
model for SMT system, training corpus is filtered by test corpus. Experiments
shows that proposed method outperforms RBMT only. Therefore SPE paraphrases
output of RBMT system.
Q: Why don't you use whole corpus for SPE?
A:
Q:Do you find any intersting patterns that SPE recovers RBMT's error? or vice versa?
A:
Q: There were similar works in other language pairs. What is the novelty?
A:
Q: How long does the whole process takes?
A: 1 week for decoding.
Title: System descriptiong of NiCT SMT for NTCIR-8
Presenter: K. Yasuda
Summary:
The authors used larger monoligual dat for LM training as well as the official
data. They also divided the development corpus into 2 pieces for minimum
error rate training and reranking modules. This aims to avoid overtuned
paraments for both modules. The reranking module utilized SVM using MIRA
with searver features from SMT system because sentence level BLEU is not
reliable. However, experiments show that proposed method is worse than
baseline with monolingual corpus in the both direction in intrisic evaluation.
The also conducted post evaluation experiment, fixing baseline, varying
the number of n-best, and training train reranker using test corpus as
oracle. The oracle shows much higher performance (about 45 BLEU), which
is very promising.
Q: What do we expect from divided develompment corpus instead of the whole?
A: speed up especially at tuning stage.
Q: Did you use topic dependent model as you had done at NTCIR-7?
A: No.
Q: Why reranking is not effecive in patent domain?
A: due to different chracteristic.
Q: How long does it takes?
A: 3 days for decoding whole test corpus.
Overall question, comments:
Q: how about try pseudo gold-standard to predict the qualiyty of translation system?
A: it is a worth way, even though if there exist many SMT systems, SMT benefits from it.
Q: how about divide constrained and unconstrained (i.e. do not use external resources) task?
A: considerable.
by Hwidong Na
[
Return to top]
[Session Notes] Session 7: NTCIR-8 Multilingual Opinion Analysis (MOAT)
[Meeting Program][Online Proceedings]
Date: June 18, 2010
Time: 13:00 - 15:00
Title: Overview of MOAT Task
Presenter: Yohei Seki
Summary:
This year MOAT introduces a new subtask "cross lingual opinion Q&A". The corpus includes Japanese, Chinese (tranditional and simplified), and English. Charateristics differ from language to language, e.g. Japanese corpus has more opininated expressions that 60% are negative. Annotation argreements are high (over 0.7 kappa value) in English, Japanese and simplified Chinese. The orgarnizers provides online tools for opinion annotation. Automatic evaluation based on agreement of annotators is used for judging performances of participants. The result shows that in higher agreement differences of rank also imply significant differences, and the more opinions in source, the easier the topics are not always true. For future work, several challenges are remaining such as obtaining high quality annoattion with low cost.
Title: Opinion detection by combininig machine learning & linguistic tools
Presenter: J. Savoy
Summary:
They built a statistical model with features for classification of polarity such as adjectives, verbs, and nouns. They also suggested a new scoring method (z-score) from vocabulary of opininated sentences that measures the degree of opinatedness. By the observations that having a large positive z-score means overuse in a sentence, they adopted the logistic regression method. Failure analysis indicates that for negative opiniated setnece, z-score tends to fail to distinguish whether it is opiniated or not.
Q: How to combine z-score with sentiment resource like SentiWordNet?
A: through logestic regression and additional variable. linear combination is possible.
Title: PKUTM Experiments in NTCIR-8 MOAT Task
Presenter: T. Ma
Summary:
For opiniated task, their process consists of three parts. Preprocessing
translate traditional Chines to simplified Chinese. Feature selection extracts
various information and classification using SVM gives the best result
for this task, which means the task is linearly separable. For holder/target
task, chunking and heuristic rules are used because they give more accurate
and easier way to control information than (shallow) parsing.
Q: How to translate Traditional Chinese to simplifed Chinese?
A: by character conversion (1-to-1)
Q: Your heuristics can work in Traditional Chinese as well? Did you try?
A: no
Title: Supervised approaches and depenedency parsing for Chinese opinion analysis
Presenter: B. Lu
Summary:
For subjectivity and polarity classification, they specify reproting verb,
sentiment-baring words, some adverbs, etc. to combine sentiment lexicon
resources with MaxEnt and SVM. As a combination method, majority voting
is used to select the best result from three ourput from supervised system,
SVM and MaxEnt. The result shows that combination outperforms each standalone
system. For holder/target identification, they define a heuritical processing
using dependency parsing. They achieves high accuracy at holder identification,
but find about 50% of correct target.
Q: Which module contributs the most in combination?
A: in isolating experiments, MaxEnt underperforms SVM.
Title: The OpAL system at NTCIR-8 MOAT
Presenter: E. Boldrini
Summary:
They desigend to support temporal expression and anphora resolution in this task. In monolingual task in English, they define rules to classify polarity and train SVM over NTCIR-8, MPQA, and EmotiBlog with POS information. In English-Chinese cross-lingual QA task, they used web translation to obtain Chinese sentement lexicon.
Q: How to use temporal expression?
A:
by Hwidong Na
[
Return to top]
Session 7: MOAT
Date: June 18, 2010
Time: 13:00-13:40
Speaker: Yohei
Seki
Title:
Overview of the Patent Mining Task at the NTCIR-8
Workshop
Summary:
The speaker discusses the goal, task description, evaluation results, and the participant approaches for the 3rd
multilingual opinion analysis task (MOAT) in the NTCIR-8 workshop. They explore
their task from past experiences towards cross-lingual opinion analysis
application. In order to solve this challenge problem, they believe that two
solutions are required: (1) languages-transfer approaches with semi-supervised
techniques and (2) cross-lingual opinion question answering capability. To get
closer to this goal, they create an opinion annotation corpora based on opinion
Q&A in a common format across languages. Many teams participated in the
subtasks for more than two languages sides and some teams also participate in
cross-lingual subtask. There are 56 result runs submissions from 16 participants,
and half of the participants submitted the results in more than two languages
related tasks. They hope that the MOAT in NTCIR-8 will be a milestone for
cross-lingual opinion analysis researches. In this task, many participants
proposed and challenged a new opinion extraction technology. The smart feature
filter and machine learning technologies with rich lexicon should be successful
keys for opinion analysis.
Time: 13:40-14:00
Speaker: Olena
Zubaryeva
Title:
Opinion Detection by Combining Machine Learning
& Linguistic Tools
Summary:
The speaker presents their work in the Multilingual
Opinion Analysis Task (MOAT) done during the NTCIR-8 evaluation campaign. They propose
a probabilistic model derived from Muller's method that allows them to
determine and weight terms (isolated words, bigram of words, noun phrases,
etc.) belonging to a given category (or subset of the corpus) compared to the
rest of the corpus. Based on these terms and their weights, they adopt the
logistic regression method in order to define the most probable category for
each input sentence. They participation is strongly motivated by the objective
to suggest an approach on the polarity subtask of the MOAT with a minimal
linguistic component with a possibility to have its performance improved by natural
language specific tools. Thus, for the English language, they adopt a
combination of both machine learning approach (Z score and logistic regression)
and a polarity dictionary (linguistic component). For the traditional Chinese
and Japanese languages however, they current system is limited to a machine
learning scheme.
Time: 14:00-14:20
Speaker: Tengfei Ma
Title:
PKUTM Experiments in NTCIR-8 MOAT Task
Summary:
The
speaker describes their work in the Simplified Chinese opinion analysis tasks
in NTCIR-8. In the task of detecting opinioned sentences, various sentiment
lexicons are used, including opinion indicators, opinion operators, degree
adverbs and opinion words. The linear SVM model is selected as the main
classifier, and four groups of features are extracted according to
punctuations, words and sentiment lexicons. They also try a two-step
classification to improve the SVM result. For extracting the opinion holder and
target, they use a synthesis of CRF and heuristic rules. The evaluation results
on NTCIR-8 MOAT Simplified Chinese side show that their system achieves the
best F-measure in two tasks. This demonstrates that the proposed framework is
promising.
Time: 14:20-14:40
Speaker: Bin LU
Title:
Supervised Approaches
and Dependency Parsing for Chinese Opinion Analysis at NTCIR-8
Summary:
The speaker describes their participating system, which is based on supervised
approaches and dependency parsing, for opinion analysis on traditional
Chinese texts at NTCIR-8. For opinionated sentence recognition, the supervised
lexicon-based approach, SVM and Maximum Entropy are combined together.
For polarity classification, they use only the supervised lexicon-based
approach. For opinion holder and target identification, they, on the basis
of dependency parsing, identify opinion holders by means of reporting verbs
and identify opinion targets by considering both opinion holders and opinion-bearing
words. The results show that among all the teams participating in the traditional
Chinese task, their system achieve: 1) the highest F-measure on the opinionated
sentence recognition task, 2) the second highest F-measure on the identification
of both opinion holders and targets, 3) the middle ranking for opinion
polarity classification.
Time: 14:40-15:00
Speaker: Alexandra
Balahur
Title:
The OpAL System at NTCIR 8 MOAT
Summary:
The speaker first introduces the background of the research.The present is
marked by the availability of large volumes of heterogeneous data, whose
management is extremely complex. While the treatment of factual data has been
widely studied, the processing of subjective information still poses important challenges.
This is especially true in tasks that combine Opinion Analysis with other
challenges, such as the ones related to Question Answering. Then the speaker
describes the different approaches they employed in the NTCIR 8 MOAT
monolingual English (opinionatedness, relevance, answerness and polarity) and cross-lingual
English-Chinese tasks, implemented in their OpAL system. The results obtained
when using different settings of the system, as well as the error analysis
performed after the competition, offered some clear insights on the best combination
of techniques, that balance between
precision and recall. Contrary to their initial intuitions, they have also seen
that the inclusion of specialized Natural Language Processing tools dealing
with Temporality or Anaphora Resolution lowers the system performance, while
the use of topic detection techniques using faceted search with Wikipedia and
Latent Semantic Analysis leads to satisfactory system performance, both for the
monolingual setting, as well as in a multilingual one.
by Jian Zhang
[Return to top]
Last updated: July 09, 2010