... models into low-frequency word
pairs in bilingual sentences, and then improved the
word alignment performance. The SRH regards
all of the different words coupled with the same
word in the synonym pairs ... sen-
140
Figure 1: Graphical model of HM-BiTAM
alignment quality.
2 Bilingual WordAlignment Model
In this section, we review a conventional gener-
ative wordalignment model, HM-BiTAM (Zhao
and Xing, ... (f
j
n
|E
n
, a
j
n
, z
n
; B ): sample a
target word f
j
n
given an aligned source
word and topic
where alignment a
j
n
= i denotes source word e
i
and target word f
j
n
are aligned. α is a parame-
ter...
... translations
by wordalignment but also becaus e of such interface
issues that aligning words manually has the reputa-
tion of being a very tedious task.
3 Yawat
Yawat (Yet Another WordAlignment Tool) ... Ex-
plorer.
Figure 3: Alignment v isualization with Yawat. As the mouse is moved over a word, th e word and all words linked
with it are highlighted. The highlighting is removed when the mouse leaves the word ... the term wordalignment
1
Yawat was first presented at the 2007 Linguistic Annota-
tion Workshop (Germann, 2007).
to refer to any form of alignment that identifies words
or groups of words as...
... model many-to-one word alignments,
where each source word is aligned with zero or
one target words, and therefore each target word
can be aligned with many source words. Each
source word is labelled ... one-to-many alignments, where each target
word is aligned with zero or more source words.
Many-to-many alignments are recoverable using
the standard techniques for superimposing pre-
dicted alignments ... null, denot-
ing no alignment. An example word alignment
is shown in Figure 1, where the hollow squares
and circles indicate the correct alignments. In this
example the French words une and autre...
... automatic word alignment. Context vec-
tors are built from the alignments found in a paral-
lel corpus. Each aligned word type is a feature in
the vector of the target word under consideration.
The alignment ... for the automatic word
alignment described below.
5.2.2 Alignment Context
Context vectors are populated with the links to
words in other languages extracted from automatic
word alignment. We applied ... translational context based on word
alignment and the combination of both. For both
approaches, we used a cutoff n for each row in our
word- by-context matrix. A word is discarded if
the row marginal...
... language word similarity of the Chinese word c and the Japanese
word given the English word
);,( efcsim
f
e
Figure 1. Similarity Calculation
English word e. For the ambiguous English word
e, ... context word .
ij
ct
j
e 0
=
ij
ct
if does not occur in Set i .
j
e
(4) Given the English word
e
, calculate the cross-language word similarity between the Chinese
word and the Japanese word ... one for head words and the
other for non-head words.
Distortion Probability for Head Words
The distortion probability for head
words represents the relative position of the head
word of the...
... as 1.
In building wordalignment models, a special
“NULL” word is usually introduced to address tar-
get words that align to no source words. Since this
physically non-existing word is not in the ... a
m
1
specifies the indices of source words
that target words are aligned to.
In an HMM-based wordalignment model, source
words are treated as Markov states while target
words are observations that are ... generative word
alignment models. Prior knowledge serves as soft
constraints that shall be placed on translation lexi-
con to guide wordalignment model training and dis-
ambiguation during Viterbi alignment...
... a family of word alignment.
Definition 1. The ITG alignment family is a set of
word alignments that has at least one BTG deriva-
tion.
ITG alignment family is only a subset of word
alignments because ... am-
biguity in wordalignment is the case where two or
more derivations d
1
, d
2
, d
k
of G have the same
underlying wordalignment A. A grammar G is non-
spurious if for any given word alignment, ... Null -word Attachment Ambiguity
Definition 4. For any given sentence pair (e, f) and
its alignment A, let (e
, f
) be the sentence pairs
with all null-aligned words removed from (e, f).
The alignment...
... are
less than 20 percent.
2 1 : n Word Alignment
Our discussion of uni-directional alignments of
word alignment is limited to IBM Model 4.
Definition 1 (Word alignment task) Let e
i
be
the i-th ... two word alignments
as an alignment point, 2) add new alignment points
that exist in the union with the constraint that a
new alignment point connects at least one previ-
ously unaligned word, ... mechanism to aug-
ment one source word into several source words
or delete a source word, while a NULL insertion
is a mechanism of generating several words from
blank words. Fertility uses a conditional...
... sums, for each word w, the number of words
not linked to w that fall between the first and last
words linked to w. The other features counts only
such words that are linked to some word other than
w. ... have
a function word not linked to anything, between
two words linked to the same word.
exact match feature We have a feature that
sums the number of words linked to identical
words. This is motivated ... association with respect to a word in a
sentence pair to be the number of association types
(word- type to word- type) for that word that have
higher association scores, such that words of both
types occur...
... bilin-
gual wordalignment finds word- to -word connec-
tions across languages. Originally introduced as a
byproduct of training statistical translation models
in (Brown et al., 1993), wordalignment ... im-
proved alignments.
2 Constrained Alignment
Let an alignment be the complete structure that
connects two parallel sentences, and a link be
one of the word- to -word connections that make
up an alignment. ... traditional wordalignment techniques.
Otherwise, the features remain the same,
including distance features that measure
abs
j
|E|
−
k
|F |
; orthographic features; word
frequencies; common-word...
... methods for word
alignment. In addition, we improve the
word alignment results by combining the
results of the two semi-supervised boost-
ing methods. Experimental results on
word alignment ... Statisti-
cal Word Alignment. In Proc. of the 10
th
Machine
Translation Summit, pages 313-320.
Hua Wu, Haifeng Wang, and Zhanyi Liu. 2005.
Alignment Model Adaptation for Domain-Specific
Word Alignment. ...
train the alignment models with unlabeled data.
A question about wordalignment is whether
we can further improve the performances of the
word aligners with available data and available
alignment...
... language word.
is expressed as follows: a word qualifies for clus-
tering if
As before, are all the target language words
that cooccur with source language word .
Similarly to the most frequent words, ... contain one word.
Then the similarity score of the
merged cluster will be the similarity score of
the word pair.
2. Merge a cluster that contains a single word
and a cluster that contains words
and ... the -word cluster, av-
eraged with the similarity scores between the
single word and all words in the cluster. This
means that the algorithm computes the similar-
ity score between the single word...
... optimal alignment.
Section 2 describes the clue alignment model
and ways of estimating parameters from associ-
ation scores. Section 3 introduces the alignment
approach which is based on wordalignment ... an alignment clue for the cor-
responding word pairs. The likelihood of each
translation alternative can be weighted, e.g., by
frequency (if available).
2.3 Clue Combinations
So far, wordalignment ... therefore,
they can be dismissed in the alignment process.
3 Clue Alignment
Word alignment clues as described above can be
used to model the relations between words of
translated texts. Parameters...
... environment.
2 Alignment Spaces
Let an alignment be the entire structure that con-
nects a sentence pair, and let a link be the in-
dividual word- to -word connections that make up
an alignment. An alignment ... concerned with the space of
alignments searched by word alignment
systems. We focus on situations where
word re-ordering is limited by syntax. We
present two new alignment spaces that
limit an ... comparison of five alignment spaces,
and show that limiting search w ith an ITG
reduces error rate by 10%, while a D-ITG
produces a 31% reduction.
1 Introduction
Bilingual wordalignment finds word- level...
... problems
for wordalignment models since, unlike English,
Czech words have a complex inflectional morphol-
ogy, and the syntax permits relatively free word or-
der. For this language pair, we evaluate alignment
error ... commen-
tary (3.1M words),
9
and an Urdu-English corpus
(2M words) provided by NIST for the 2009 Open
MT Evaluation. These pairs were selected since
each poses different alignment challenges (word or-
8
This ... alignments. One is the aver-
age alignment “fertility” of source words that occur
only a single time in the training data (so-called ha-
pax legomena). This assesses the impact of a typical
alignment...