... context in-
formation at the sentence level, we adopt the
topical context information in our method for
the following reasons: (1) the topic informa-
tion captures the context information beyond
the ... adaptation for statistical machine
translation. Machine Translation, pages 187-207.
Nicola Ueffing, Gholamreza Haffari and Anoop Sarkar.
2008. Semi-supervised Model Adaptation for Statisti-
cal Machine ... source
toolkit for statistical machine translation. In Proc. of
ACL 2007, Demonstration Session, pages 177-180.
Yang Liu, Qun Liu and Shouxun Lin. 2006. Tree-
to-String Alignment Template for Statistical Machine
Translation....
... Phrase-Based
Model for Statistical Machine Translation. In Proc.
ACL, pages 263-270.
Michael Collins, Philipp Koehn and Ivona Kucerova.
2005. Clause restructuring for statistical machine
translation. ... Derived Rules for Improved Statistical Machine
Translation. In Proc. Coling, pages 1119-1127.
Chao Wang, Michael Collins, Philipp Koehn. 2007. Chi-
nese syntactic reordering for statistical machine ... Models for Statistical Machine Transla-
tion. In Proc. Workshop on Statistical Machine Trans-
lation, HLT-NAACL, pages 127-133.
920
Proceedings of the 50th Annual Meeting of the Association for...
... Discriminative
Phrase Selection for SMT. In: Goutte et al (ed.),
Learning Machine Translation. MIT Press.
K. Gimpel and N. A. Smith. 2008. Rich Source-Side
Context for Statistical Machine Translation. ... Association for Computational Linguistics, pages 834–843,
Uppsala, Sweden, 11-16 July 2010.
c
2010 Association for Computational Linguistics
Bilingual Sense Similarity for Statistical Machine Translation ...
units. Therefore, questions emerge: how good is
the sense similarity computed via VSM for two
units from parallel corpora? Is it useful for multi-
lingual applications, such as statistical machine...
...
Number of Foreign Words Annotated
BLEU Score
Number of Foreign Words Annotated
the approx. 54,500 foreign words
we selectively sampled
for annotation
cost = $205.80
last approx. 700,000
foreign ... HNG, for short.
HNG solicits translations only for trigger n-grams
and not for entire sentences. We provide senten-
tial context, highlight the trigger n-gram that we
want translated, and ask for ... estimate the time required
for POS annotating. Kapoor et al. (2007) assign
costs for AL based on message length for a voice-
mail classification task. In contrast, we show for
SMT that annotation...
... 31–36,
Uppsala, Sweden, 13 July 2010.
c
2010 Association for Computational Linguistics
Unsupervised Search for The Optimal Segmentation for Statistical
Machine Translation
Cosákun Mermer
1,3
and Ahmet ... Foster, editors, Learning Machine Transla-
tion, chapter 5, pages 93–110. MIT Press.
Nizar Habash and Fatiha Sadat. 2006. Arabic prepro-
cessing schemes for statistical machine translation.
In ... Evan Herbst. 2007. Moses: Open
source toolkit for statistical machine translation. In
Proceedings of the 45th Annual Meeting of the Asso-
ciation for Computational Linguistics, Companion
Volume:...
... estimation for machine
translation. In The JHU Workshop Final Report, Balti-
more, Maryland, USA, April.
David Chiang, Kevin Knight, and Wei Wang. 2009. 11,001
new features for statistical machine ... estimation formachine translation. Computational
Linguistics, 33(1):9–40.
Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki
Isozaki. 2007. Online large-margin training for statisti-
cal machine ... the Association for Computational Linguistics, pages 211–219,
Portland, Oregon, June 19-24, 2011.
c
2011 Association for Computational Linguistics
Goodness: A Method for Measuring Machine Translation...
... Association for Computational Linguistics.
Philipp Koehn et al. 2007. Moses: Open source toolkit
for statistical machine translation. In Proceedings of
the 45th Annual Meeting of the Association for ... SSRs.
Notice that for each new sentence generated, we al-
low for application of only one substitution.
Although the idea is straightforward, we face two
problems in practice. First, for frequent ... Association for Computational Linguistics:shortpapers, pages 294–298,
Portland, Oregon, June 19-24, 2011.
c
2011 Association for Computational Linguistics
Corpus Expansion for Statistical Machine...
... Decoder for
Phrase-based Statistical Machine Translation Mod-
els. In Proc. of the Association forMachine Trans-
lation in the Americas (AMTA).
P. Koehn. 2004b. Statistical Significance Tests for
Machine ... 2006.
c
2006 Association for Computational Linguistics
Combination of Arabic Preprocessing Schemes
for Statistical Machine Translation
Fatiha Sadat
Institute for Information Technology
National ... Spain.
Y. Lee. 2004. Morphological Analysis for Statistical
Machine Translation. In Proc. of NAACL, Boston,
MA.
Y. Lee. 2005. IBM Statistical Machine Translation for
Spoken Languages. In Proc. of International...
... rules.
5 Open-Source Machine Translation
Despite the recognized need for translation, there
is no widely used open-source machine translation
system. One of the major reasons for this lack of
success ... Boks 1080 Blindern; 0316 Oslo (Norway)
♠
Center for the Study of Language and Information, Stanford, CA 94305 (USA)
{ jtl@ifi.uio.no| oe@csli.stanford.edu}
Abstract
The LOGON MT demonstrator assembles
independently ... professional transla-
tors. Three quarters of the material are available
for system development and also serve as training
data formachine learning approaches. Using the
discriminant-based Redwoods...
... 2005.
c
2005 Association for Computational Linguistics
Clause Restructuring for Statistical Machine Translation
Michael Collins
MIT CSAIL
mcollins@csail.mit.edu
Philipp Koehn
School of Informatics
University ... describe an approach for the use
of syntactic information within phrase-based SMT
systems. The approach constitutes a simple, direct
method for the incorporation of syntactic informa-
tion in a phrase–based ... this perhaps in the voting.
Particles
Before: Wir fordern das Praesidium auf,
After: Wir auf fordern das Praesidium,
English: We ask the Bureau,
Infinitives
Before: Ich werde der Sache nachgehen dann,
After:...
... of
support vector machines (SVM). However, Eq. 8 is more
suitable for non-separable problems (which is often the
case for SMT) since it directly models the conditional
probability for the candidate ... corresponding to la-
bel . The symbol is short-hand for the feature-
vector . This formulation is slightly differ-
ent from the standard maximum entropy formulation typ-
ically encountered in NLP applications, ... abstract maximum entropy training formulation:
(8)
In this formulation, is the weight vector which we want
to compute. The set consists of candidate labels for
the -th training instance, with...
... lexicon models lack from context infor-
mation that can be extracted from the same paral-
lel corpus. This additional information could be:
Simple context information: information of
the words surrounding ... surrounding the word pair;
Syntactic information: part-of-speech in-
formation, syntactic constituent, sentence
mood;
Semantic information: disambiguation in-
formation (e.g. from WordNet), cur-
rent/previous ... fact that the
algorithm for computing the
-best lists is sub-
optimal.
Table 8: Preliminary translation results for the
Verbmobil Test-147 for different contextual infor-
mation and different...
... additional
parameter into the recursion formula for DP. In the
following, we will explain this method in detail.
2.3 Recursion
Formula for
DP
In the DP formalism, the search process is described ... little meaningful information or the information
is different from the input.
Examples for each category are given in Table
3. Table 4 shows the statistics of the translation
performance. When ... experimental results for a bilingual cor-
pus are reported.
1.1 Statistical Machine Translation
In statistical machine translation, the goal of the
search strategy can be formulated as follows:...
... LDC seg-
menter
2
and Stanford segmenter version 2006-05-
11
3
. Both ICTCLAS and Stanford segmenters
utilise machine learning techniques, with Hidden
Markov Models for ICT (Zhang et al., 2003) ... 2009.
c
2009 Association for Computational Linguistics
Bilingually Motivated Domain-Adapted Word Segmentation
for Statistical Machine Translation
Yanjun Ma Andy Way
National Centre for Language Technology
School ... to be large. We can
also observe that the ICT and Stanford segmenter
consistently outperform the LDC segmenter. Even
using 3M sentence pairs for training, the differ-
ences between them are still...
...
end for
12:
end for
13:
for each hypothesis
do
14:
compute HM decoding features for
15:
add to
16:
end for
17:
for ... BTG-based HM Decoding
1:
for each component model
do
2:
output the search space
for the input
3:
end for
4:
for to
do
5:
for all s.t. do ... SCFG-based HM Decoding
1:
for each component model
do
2:
output the search space
for the input
3:
end for
4:
for to
do
5:
for all s.t. do...