... models into low-frequency word
pairs in bilingual sentences, and then improved the
word alignment performance. The SRH regards
all of the different words coupled with the same
word in the synonym pairs ... sen-
140
Figure 1: Graphical model of HM-BiTAM
alignment quality.
2 Bilingual WordAlignment Model
In this section, we review a conventional gener-
ative wordalignment model, HM-BiTAM (Zhao
and Xing, ... (f
j
n
|E
n
, a
j
n
, z
n
; B ): sample a
target word f
j
n
given an aligned source
word and topic
where alignment a
j
n
= i denotes source word e
i
and target word f
j
n
are aligned. α is a parame-
ter...
... translations
by wordalignment but also becaus e of such interface
issues that aligning words manually has the reputa-
tion of being a very tedious task.
3 Yawat
Yawat (Yet Another WordAlignment Tool) ... Ex-
plorer.
Figure 3: Alignment v isualization with Yawat. As the mouse is moved over a word, th e word and all words linked
with it are highlighted. The highlighting is removed when the mouse leaves the word ... the term wordalignment
1
Yawat was first presented at the 2007 Linguistic Annota-
tion Workshop (Germann, 2007).
to refer to any form of alignment that identifies words
or groups of words as...
... improvements on word alignments
(Ayan et al., 2005; Moore, 2005; Ittycheriah and
Roukos, 2005; Taskar et al., 2005).
The standard technique for evaluating word
alignments is to represent alignments ... algorithms to generate word
alignments. However, evaluating word alignments
is difficult because even humans have difficulty
performing this task.
The state-of-the art evaluation metric—
alignment error ... pairs of words) and to compare the gen-
erated alignment against manual alignment of the
same data at the level of links. Manual align-
ments are represented by two sets: Probable (P )
alignments...
... model many-to-one word alignments,
where each source word is aligned with zero or
one target words, and therefore each target word
can be aligned with many source words. Each
source word is labelled ... one-to-many alignments, where each target
word is aligned with zero or more source words.
Many-to-many alignments are recoverable using
the standard techniques for superimposing pre-
dicted alignments ... null, denot-
ing no alignment. An example word alignment
is shown in Figure 1, where the hollow squares
and circles indicate the correct alignments. In this
example the French words une and autre...
... automatic word alignment. Context vec-
tors are built from the alignments found in a paral-
lel corpus. Each aligned word type is a feature in
the vector of the target word under consideration.
The alignment ... for the automatic word
alignment described below.
5.2.2 Alignment Context
Context vectors are populated with the links to
words in other languages extracted from automatic
word alignment. We applied ... the target word
P(W) is the probability of seeing the word
P(f) is the probability of seeing the feature
P(W,f) is the probability of seeing the word and the feature
together.
3.3 Word Alignment
The...
... language word similarity of the Chinese word c and the Japanese
word given the English word
);,( efcsim
f
e
Figure 1. Similarity Calculation
English word e. For the ambiguous English word
e, ... context word .
ij
ct
j
e 0
=
ij
ct
if does not occur in Set i .
j
e
(4) Given the English word
e
, calculate the cross-language word similarity between the Chinese
word and the Japanese word ... one for head words and the
other for non-head words.
Distortion Probability for Head Words
The distortion probability for head
words represents the relative position of the head
word of the...
... as 1.
In building wordalignment models, a special
“NULL” word is usually introduced to address tar-
get words that align to no source words. Since this
physically non-existing word is not in the ... a
m
1
specifies the indices of source words
that target words are aligned to.
In an HMM-based wordalignment model, source
words are treated as Markov states while target
words are observations that are ... generative word
alignment models. Prior knowledge serves as soft
constraints that shall be placed on translation lexi-
con to guide wordalignment model training and dis-
ambiguation during Viterbi alignment...
... a family of word alignment.
Definition 1. The ITG alignment family is a set of
word alignments that has at least one BTG deriva-
tion.
ITG alignment family is only a subset of word
alignments because ... am-
biguity in wordalignment is the case where two or
more derivations d
1
, d
2
, d
k
of G have the same
underlying wordalignment A. A grammar G is non-
spurious if for any given word alignment, ... Null -word Attachment Ambiguity
Definition 4. For any given sentence pair (e, f) and
its alignment A, let (e
, f
) be the sentence pairs
with all null-aligned words removed from (e, f).
The alignment...
... are
less than 20 percent.
2 1 : n Word Alignment
Our discussion of uni-directional alignments of
word alignment is limited to IBM Model 4.
Definition 1 (Word alignment task) Let e
i
be
the i-th ... two word alignments
as an alignment point, 2) add new alignment points
that exist in the union with the constraint that a
new alignment point connects at least one previ-
ously unaligned word, ... mechanism to aug-
ment one source word into several source words
or delete a source word, while a NULL insertion
is a mechanism of generating several words from
blank words. Fertility uses a conditional...
... sums, for each word w, the number of words
not linked to w that fall between the first and last
words linked to w. The other features counts only
such words that are linked to some word other than
w. ... have
a function word not linked to anything, between
two words linked to the same word.
exact match feature We have a feature that
sums the number of words linked to identical
words. This is motivated ... association with respect to a word in a
sentence pair to be the number of association types
(word- type to word- type) for that word that have
higher association scores, such that words of both
types occur...
... bilin-
gual wordalignment finds word- to -word connec-
tions across languages. Originally introduced as a
byproduct of training statistical translation models
in (Brown et al., 1993), wordalignment ... im-
proved alignments.
2 Constrained Alignment
Let an alignment be the complete structure that
connects two parallel sentences, and a link be
one of the word- to -word connections that make
up an alignment. ... traditional wordalignment techniques.
Otherwise, the features remain the same,
including distance features that measure
abs
j
|E|
−
k
|F |
; orthographic features; word
frequencies; common-word...
... methods for word
alignment. In addition, we improve the
word alignment results by combining the
results of the two semi-supervised boost-
ing methods. Experimental results on
word alignment ... Statisti-
cal Word Alignment. In Proc. of the 10
th
Machine
Translation Summit, pages 313-320.
Hua Wu, Haifeng Wang, and Zhanyi Liu. 2005.
Alignment Model Adaptation for Domain-Specific
Word Alignment. ...
train the alignment models with unlabeled data.
A question about wordalignment is whether
we can further improve the performances of the
word aligners with available data and available
alignment...
... VBN
NNS
DT
AUX
The
jobs
are
career
oriented
.
les
emplois
sont
axés
sur
la
carrière
.
.
Legend
Correct proposed wordalignment consistent with
human annotation.
Proposed wordalignment error inconsistent with
human annotation.
Word alignment constellation that renders ... word alignment. This dependence
runs deep; for example, Galley et al. (2006) requires
word alignments to project trees from the target lan-
guage to the source, while Chiang (2005) requires
alignments ... compat-
ibility with the word alignment. For a constituent c
of t, we consider the set of source words s
c
that are
aligned to c. If none of the source words in the lin-
ear closure s
∗
c
(the words between...
... language word.
is expressed as follows: a word qualifies for clus-
tering if
As before, are all the target language words
that cooccur with source language word .
Similarly to the most frequent words, ... contain one word.
Then the similarity score of the
merged cluster will be the similarity score of
the word pair.
2. Merge a cluster that contains a single word
and a cluster that contains words
and ... the -word cluster, av-
eraged with the similarity scores between the
single word and all words in the cluster. This
means that the algorithm computes the similar-
ity score between the single word...
... optimal alignment.
Section 2 describes the clue alignment model
and ways of estimating parameters from associ-
ation scores. Section 3 introduces the alignment
approach which is based on wordalignment ... therefore,
they can be dismissed in the alignment process.
3 Clue Alignment
Word alignment clues as described above can be
used to model the relations between words of
translated texts. Parameters ... information. Wordalignment approaches
focus on the automatic identification of translation
relations in translated texts. Alignments are usu-
ally represented as a set of links between words
and...