... statistical wordalignment
with limited labeled data and large amounts of
unlabeled data. In this algorithm, we built an in-
terpolated model by using both the labeled data
919
and the unlabeled ... each word aligner using
only the labeled data. Based on this semi-
supervised boosting algorithm, we inves-
tigate two boosting methods for word
alignment. In addition, we improve the
word alignment ... incor-
porating the unlabeled data. In this algo-
rithm, we build a word aligner by using
both the labeled data and the unlabeled
data. Then we build a pseudo reference
set for the unlabeled data,...
... automatic word alignment. Context vec-
tors are built from the alignments found in a paral-
lel corpus. Each aligned word type is a feature in
the vector of the target word under consideration.
The alignment ... for the automatic word
alignment described below.
5.2.2 Alignment Context
Context vectors are populated with the links to
words in other languages extracted from automatic
word alignment. We applied ... the target word
P(W) is the probability of seeing the word
P(f) is the probability of seeing the feature
P(W,f) is the probability of seeing the word and the feature
together.
3.3 Word Alignment
The...
... language word similarity of the Chinese word c and the Japanese
word given the English word
);,( efcsim
f
e
Figure 1. Similarity Calculation
English word e. For the ambiguous English word
e, ... equation (7).
The word similarity of the Chinese word "河
岸 (bank2)" and the Japanese word " 銀行
(bank1)" given the word English word "bank" is
low. Thus, using the updated ... given English
word and Japanese word , and put the English sentences in the pairs into Set 2.
e
f
(3) Construct the feature vectors and of the given English wordusing all other words as
context...
... language word.
is expressed as follows: a word qualifies for clus-
tering if
As before, are all the target language words
that cooccur with source language word .
Similarly to the most frequent words, ... assumedly
variants of the same word, then further refines the
clusters using a cooccurrence related measure. Word
variants are found via a stemmer or by clustering all
words that begin with the same ... contain one word.
Then the similarity score of the
merged cluster will be the similarity score of
the word pair.
2. Merge a cluster that contains a single word
and a cluster that contains words
and...
... (Pause). Look at the words around the picture (Pause.)
Find the biggest word. What is that word? (Students will say, "Table.") A line has been
drawn from that word to the TABLE to ... pattern of lining.
• Have one example word that is familiar to all students and is bigger than the rest of
the words.
Difficulty of Items
The task will contain words of varying degrees of difficulty, ... basis of the frequency of word use,
complexity of the graphic image, and the meaning of the word.
Response Attributes
• Response: Students will look at the picture and the words and then draw lines...
... system. For sub-tree-
alignment, we use the above wordalignment to
learn lexical /word alignment feature, and train
with the FBIS training corpus (200) using the
composite kernel of Plain+dBTK-Root+iBTK-
RdSTT. ... bilingual phrases to be consistent with
Either Wordalignment or Sub-tree alignment
(EWoS) instead of being originally consistent
with the wordalignment only. The method helps
tailoring the ...
refers to the word set for the ex-
ternal span of the source sub-tree
, while
refers to that of the target sub-tree
.
Internal WordAlignment Features: The word
alignment links...
... a
method for using synonym information effectively
to improve wordalignment quality.
In general, synonym relations are defined in
terms of word sense, not in terms of word form. In
other words, synonym ... of the word
alignment model by using synonym pairs includ-
ing such ambiguous synonym words.
Finally, we discuss the data set size used for un-
supervised training. As shown in Table 1, using
a ... sen-
140
Figure 1: Graphical model of HM-BiTAM
alignment quality.
2 Bilingual WordAlignment Model
In this section, we review a conventional gener-
ative wordalignment model, HM-BiTAM (Zhao
and Xing,...
... translations
by wordalignment but also becaus e of such interface
issues that aligning words manually has the reputa-
tion of being a very tedious task.
3 Yawat
Yawat (Yet Another WordAlignment Tool) ... Ex-
plorer.
Figure 3: Alignment v isualization with Yawat. As the mouse is moved over a word, th e word and all words linked
with it are highlighted. The highlighting is removed when the mouse leaves the word ... the term wordalignment
1
Yawat was first presented at the 2007 Linguistic Annota-
tion Workshop (Germann, 2007).
to refer to any form of alignment that identifies words
or groups of words as...
... model many-to-one word alignments,
where each source word is aligned with zero or
one target words, and therefore each target word
can be aligned with many source words. Each
source word is labelled ... one-to-many alignments, where each target
word is aligned with zero or more source words.
Many-to-many alignments are recoverable using
the standard techniques for superimposing pre-
dicted alignments ... null, denot-
ing no alignment. An example word alignment
is shown in Figure 1, where the hollow squares
and circles indicate the correct alignments. In this
example the French words une and autre...
... as 1.
In building wordalignment models, a special
“NULL” word is usually introduced to address tar-
get words that align to no source words. Since this
physically non-existing word is not in the ... a
m
1
specifies the indices of source words
that target words are aligned to.
In an HMM-based wordalignment model, source
words are treated as Markov states while target
words are observations that are ... generative word
alignment models. Prior knowledge serves as soft
constraints that shall be placed on translation lexi-
con to guide wordalignment model training and dis-
ambiguation during Viterbi alignment...
... a family of word alignment.
Definition 1. The ITG alignment family is a set of
word alignments that has at least one BTG deriva-
tion.
ITG alignment family is only a subset of word
alignments because ... am-
biguity in wordalignment is the case where two or
more derivations d
1
, d
2
, d
k
of G have the same
underlying wordalignment A. A grammar G is non-
spurious if for any given word alignment, ... Null -word Attachment Ambiguity
Definition 4. For any given sentence pair (e, f) and
its alignment A, let (e
, f
) be the sentence pairs
with all null-aligned words removed from (e, f).
The alignment...
... are
less than 20 percent.
2 1 : n Word Alignment
Our discussion of uni-directional alignments of
word alignment is limited to IBM Model 4.
Definition 1 (Word alignment task) Let e
i
be
the i-th ... two word alignments
as an alignment point, 2) add new alignment points
that exist in the union with the constraint that a
new alignment point connects at least one previ-
ously unaligned word, ... mechanism to aug-
ment one source word into several source words
or delete a source word, while a NULL insertion
is a mechanism of generating several words from
blank words. Fertility uses a conditional...
... sums, for each word w, the number of words
not linked to w that fall between the first and last
words linked to w. The other features counts only
such words that are linked to some word other than
w. ... have
a function word not linked to anything, between
two words linked to the same word.
exact match feature We have a feature that
sums the number of words linked to identical
words. This is motivated ... association with respect to a word in a
sentence pair to be the number of association types
(word- type to word- type) for that word that have
higher association scores, such that words of both
types occur...
... bilin-
gual wordalignment finds word- to -word connec-
tions across languages. Originally introduced as a
byproduct of training statistical translation models
in (Brown et al., 1993), wordalignment ... im-
proved alignments.
2 Constrained Alignment
Let an alignment be the complete structure that
connects two parallel sentences, and a link be
one of the word- to -word connections that make
up an alignment. ... exer-
cise for word alignment. In HLT-NAACL Workshop on
Building and Using Parallel Texts, pages 1–10, Edmon-
ton, Canada.
R. Moore. 2005. A discriminative framework for bilingual
word alignment. ...
... optimal alignment.
Section 2 describes the clue alignment model
and ways of estimating parameters from associ-
ation scores. Section 3 introduces the alignment
approach which is based on wordalignment ... therefore,
they can be dismissed in the alignment process.
3 Clue Alignment
Word alignment clues as described above can be
used to model the relations between words of
translated texts. Parameters ... short words.
340
Combining Clues for Word Alignment
Rirg Tiedemann
Department of Linguistics
Uppsala University
Box 527
SE-751 20 Uppsala, Sweden
joerg@stp.ling.uu.se
Abstract
In this paper, a word...