... 27 2012.
c
2012 Association for Computational Linguistics
Power-Law Distributions for Paraphrases Extracted from Bilingual
Corpora
Spyros Martzoukos Christof Monz
Informatics Institute, University ... and target
phrases) for: (a) Components extracted from P .
‘1-1’ components are not shown. (b) Components
extracted from the decomposition of P
0
.
In the components emer...
... do not perform nearly as
well as characters. In fact, the "words" variation
increases the number of errors dramatically (from
36 to 50 for English-French and from 19 to 35 for
English-German). ... genetic code sequences from different species,
speech sequences from different speakers, gas
chromatograph sequences from different
compounds, and geologic sequences f...
... learning approach,
outperforming them by as much as 4–7% on the
three data sets for one of the performance metrics.
2 Related Work
As mentioned before, our approach differs from the
standard approach ... ranker underper-
forms the perfect ranker by about 5% for BNEWS
and 3% for both NPAPER and NWIRE in terms
of F-measure, suggesting that the supervised ranker
still has room for impr...
... be from
a mixture family of distributions. We will use x to
denote observable random variables, y to denote
hidden structure, and θ to denote the to-be-learned
parameters of the model (coming from ... steps, for 1 ≤ i ≤ r:
E-step: For each i ∈ {1, , r}, optimize the
bound given λ and q
i
(y)|
i
∈{1, ,r }\ {i}
and
q
i
(θ)|
i
∈{1, ,r }
by selecting a new distribution
q
i
(y).
M...
... extensions for
more complex situations. For example, longer doc-
uments might benefit from an analysis on the para-
graph level as well as the sentence and document
levels. One possible model for this ... a new value k
. For each doc-
ument label, the k
highest scoring labelings were
Figure 4: An extension to the model from Figure 1
incorporating paragraph level analysis.
extr...
...
(producing the text string) and for-
matting
(determining the formatting marks to
insert in the text string). Developing an appli-
cation to present the information for a given
domain is often ...
alization, and formatting. PRESENTOR is im-
plemented and is portable cross-platform and
cross-domain. It has been used with success in
several application domains including weather
fo...
... It
places no restrictions on the form of the fillers
for any slot in a gran~ node. The production
rules ~,force categorial and order~,~
restrictions. So, for example, the templates
reflect ...
verbs; it is also used to cover sane other forms
of attac~nent to, and modification of, nouns, for
example by determiners ( like "a" ) and even for
plural or singular number. I...
... of
commutativity or associativity are available for
testing logical equivalence 1. One of the
1Strictly speaking, we test for a very strict form
of consistency. Two LFs are considered logically ...
arguments.
reduce (Sign0, Sign) :-
transform(Sign0, Sign1),
reduce (Sign1, Sign) .
transform(Daughter, Mother) :-
unary_rule(Mother, Daughter).
transform(Sign0, Sign) :-
path_value(...
... systems, for a given question, a
vector is formed consisting of the most frequent
co-occurring terms with the question target as the
question profile. Candidate answers extracted
from a given ... are for people (e.g.,
Aaron Copland), 10 are for organizations (e.g.,
Friends of the Earth) and 10 are for other entities
(e.g., Quasars). We employ Lemur
6
to retrieve
relevant d...
... unified treatment of the files used for
training and of those used for evaluation (which
are already annotated in XML format) and it is
also useful if the file submitted for analysis to
FDG already contains ... algorithms implemented for
the workbench enriches this set of data with
information relevant to its particular needs.
Kennedy and Boguraev (1996), for example,
need additional i...