a dom tree alignment model for mining parallel data from the web

Báo cáo khoa học: "A DOM Tree Alignment Model for Mining Parallel Data from the Web" doc

Ngày tải lên : 08/03/2014, 02:21

... proposes a new web parallel data mining scheme. Given a pair of parallel web pages as seeds, the Document Object Model 1 (DOM) is used to represent the web pages as a pair of DOM trees. Then a ... web pages manually labeled as parallel or non- parallel. The Iterative Scaling algorithm (Pietra, Pietra and Lafferty 1995) is used for the training. 7 Experimental Results The DOM tree alignment ... discovered are regarded as anchors to new parallel data. This makes the mining scheme an iterative process. The new mining scheme has three advantages: (i) Mining coverage is increased. Parallel...

8
435
0

Báo cáo khoa học: "Mining Parenthetical Translations from the Web by Word Alignment" potx

Ngày tải lên : 17/03/2014, 02:20

... parenthetical translations are extremely valuable both as a stand-alone on-line dictionary and as training data for statistical machine translation systems. They provide fresh data (new words) and cover ... that Exact Match is a rather stringent crite- rion. Table 7 shows a random sample of extracted parenthetical translations that failed the Exact Match test. Only a small percentage of them are ... on a partially parallel corpus to extract translation pairs from the web. Treating the translation extraction problem as a word alignment problem allowed us to generalize across instances...

9
612
0

Báo cáo khoa học: "A Tree Transducer Model for Synchronous Tree-Adjoining Grammars" pdf

Ngày tải lên : 17/03/2014, 00:20

... we assume that all adjunctions are mandatory; i.e., if an auxiliary tree can be adjoined, then we need to make an adjunction. Thus, a derivation starting from an initial tree to a derived tree ... 2010. c 2010 Association for Computational Linguistics A Tree Transducer Model for Synchronous Tree- Adjoining Grammars Andreas Maletti Universitat Rovira i Virgili Avinguda de Catalunya 25, 43002 Tarragona, ... auxiliary tree by a special marker. Traditionally, the root label A of an auxiliary tree is replaced by A ∅ once adjoined. Since we assume that there are no auxiliary trees with such a root label,...

10
294
0

Tài liệu Báo cáo khoa học: "A Syntax-Driven Bracketing Model for Phrase-Based Translation" pptx

Ngày tải lên : 20/02/2014, 07:20

... to the word align- ments, we deﬁne bracketable and unbracketable instances. For each of these instances, we auto- matically extract relevant syntactic features from the source parse tree as bracketing ... consid- ered as a syntactic constraint. Therefore we can use thousands of syntactic constraints to guide phrase translation. ã The SDB model maintains and protects the strength of the phrase-based approach ... in a better way than the CMVC does. It is able to reward non-syntactic translations by assign- ing an adequate probability to them if these translations are appropriate to particular syntactic...

9
438
0

Tài liệu Báo cáo khoa học: "A Uniﬁed Syntactic Model for Parsing Fluent and Disﬂuent Speech∗" ppt

Ngày tải lên : 20/02/2014, 09:20

... modiﬁed for use in a special repair grammar, which not only reduces the amount of available training data, but violates our intuition that most reparanda are ﬂuent up until the actual edit occurs. The ... is the fact that there is often a good deal of overlap in words between the reparandum and the alteration, as speakers may trace back several words when restarting after an error. For instance, ... Communication Re- search Centre, University of Edinburgh. John Hale, Izhak Shafran, Lisa Yung, Bonnie Dorr, Mary Harper, Anna Krasnyanskaya, Matthew Lease, Yang Liu, Brian Roark, Matthew Snover, and...

4
581
0

Tài liệu Báo cáo khoa học: "A Phrase-based Statistical Model for SMS Text Normalization" ppt

Ngày tải lên : 20/02/2014, 12:20

... groups and domains can be modeled separately without accessing and adapting the language model of the MT system for each SMS application. Another advantage is that the normalization module can ... normalization as a translation problem from the SMS language to the English language 1 and we propose to adapt a phrase-based statistical MT model for the task. Evaluation by 5-fold cross validation ... a consensus translation technique to bootstrap parallel data using off -the- shelf translation systems for training a hierarchical statistical translation model for general domain instant...

8
399
0

Tài liệu Báo cáo khoa học: "GPSM: A GENERALIZED PROBABILISTIC SEMANTIC MODEL FOR AMBIGUITY RESOLUTION" pptx

Ngày tải lên : 20/02/2014, 21:20

... environment. The open test performance can be attributed to the small database size and the estimation error of the parameters thus introduced. Because the training database is small with respect ... sentences while using the remaining parts as the training set. The overall performance is then estimated as the average performance of the 10 iterations. The performance is evaluated in terms of ... under a uniform formulation. The semantic score measure shows substantial im- provement in structural disambiguation over a syntax-based approach. 1. Introduction In a large natural language...

8
412
0

Tài liệu Báo cáo khoa học: Trophoblast-like human choriocarcinoma cells serve as a suitable in vitro model for selective cholesteryl ester uptake from high density lipoproteins pdf

Ngày tải lên : 20/02/2014, 23:20

... island and cell columns, is formed which maintains the ability of proliferation and invasion. Choriocarcinoma is a malignant neoplasm that represents the early trophoblast of the attachment phase ... TOPO-TA cloning vector and sequenced. Northern blot analysis Total RNA was isolated from choriocarcinoma and human liver tissues (used as a positive control) by the RNA-easy kit (Qiagen) exactly as ... of lipoprotein-associated cholesterol across the placenta from the maternal circulation [4–8]. The fact that the placenta binds and internalizes maternal lipoproteins both in vivo and in vitro...

12
470
0

Báo cáo khoa học: A mouse model for in vivo tracking of the major dust mite allergen Der p 2 after inhalation docx

Ngày tải lên : 07/03/2014, 21:20

... band) that the clearance and further metabo- lism of the allergen was altered as a result of the inﬂammation in the lungs of sensitized animals. Up to now there are few data available on the fate of ... day 30 displayed an airway inﬂammation 18 h after treatment, in the same magni- tude as in animals challenged with a third HDM aero- sol on day 30 (data not shown). The animals were killed after ... visual- ized by autoradiography using a PhosphorImager with the image quant software (both from Molecular Dynamics, Sunnyvale, CA, USA). As standards for SDS ⁄ PAGE autoradiography, 75 Se-labelled...

12
518
0

Báo cáo khoa học: "A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness" pot

Ngày tải lên : 08/03/2014, 21:20

... the M&C data set. It surpasses HR though in the R&G and the 353-C data set. The latter contains the word pairs of the M&C data set. To visualize the performance of our measure in a more comprehen- sible ... pattern as the human ratings, as closely as our measure of relatedness does (low y values for small x values and high y values for high x). The same pattern applies in the M&C and 353-C data ... standard (human judgements). The correlations for the three data sets show that SR performs better than any other measure of semantic relatedness, besides the case of (HR) in the M&C data...

9
394
0

Báo cáo khoa học: "A Class-Based Agreement Model for Generating Accurately Inﬂected Translations" pptx

Ngày tải lên : 16/03/2014, 19:20

... 5. The inputs are a translation hypothesis e I 1 , an index n distinguishing the prefix from the attachment, and a flag indicating if their concatenation is a goal hypothesis. The beam search maintains ... from phrase-structure trees. Williams and Koehn (2011) annotated German trees, and extracted translation rules from them. They then specified manual unifi- cation rules, and applied a penalty according ... of the source. This large gap between the unigram recall of the actual translation output (top) and the lexical coverage of the phrase-based model (bottom) indi- cates that translation performance...

10
414
0

Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

Ngày tải lên : 17/03/2014, 00:20

... predict the POS tag with positional information for each character. Each character can be assigned one of two possible boundary tags: “B” for a character that begins a word and “I” for a character ... readers to read the above paper for details. For parameter estimation, our work adopt the Passive-Aggressive (PA) framework (Crammer et al., 2006), a family of margin based online learning algorithms. ... segmentation can also be formulated as a sequential classiﬁcation problem to predict whether a character is located at the beginning of, inside or at the end of a word. This character-by-character...

10
412
0

Báo cáo khoa học: "A Language-Independent Unsupervised Model for Morphological Segmentation" pot

Ngày tải lên : 17/03/2014, 04:20

... making available the code of the RePortS algorithm, and Stefan Bordag and Delphine Bern- hard for running their algorithms on the German data. Many thanks also to Matti Varjokallio for evaluating ... presented here have been shown to improve accuracy (Kurimo et al., 2006). Another motivation for evaluating the system on a task rather than on manually annotated data is that linguistically motivated morphological ... the most probable afﬁxes from both ends of the word, all possible segmentations of the word are generated and ranked using the language model. The probabilities for the language model are learnt...

8
288
0

Báo cáo khoa học: "A Hierarchical Phrase-Based Model for Statistical Machine Translation" pptx

Ngày tải lên : 17/03/2014, 05:20

... build only partial translations using hierarchical phrases, and then combine them serially as in a standard phrase-based model. For a partial example of a synchronous CFG derivation, see Figure ... Pharaoh, a state-of -the- art phrase-based system. 1 Introduction The alignment template translation model (Och and Ney, 2004) and related phrase-based models ad- vanced the previous state of the art ... Yonggang Deng, and William Byrne. 2005. A weighted ﬁnite state transducer translation template model for statistical machine translation. Natural Language Engineering. To appear. Daniel Marcu and...

8
331
0

Báo cáo khoa học: "A Generative Constituent-Context Model for Improved Grammar Induction" docx

Ngày tải lên : 17/03/2014, 08:20

... performance or data- likelihood. However, parameter search methods have a poten- tial advantage. By aggregating over only valid, com- plete parses of each sentence, they naturally incor- porate the ... shown, but are modeled. Parameter search is also local; parameters which are locally optimal may be globally poor. A con- crete example is the experiments from (Carroll and Charniak, 1992). They restricted ... Communi- cation Papers for the 97th Meeting of the Acoustical Society of America, pages 547–550. Eric Brill. 1993. Automatic grammar induction and parsing free text: A transformation-based approach....

8
316
0

Báo cáo khoa học: "The Best of Both Worlds – A Graph-based Completion Model for Transition-based Parsers" pot

Ngày tải lên : 17/03/2014, 22:20

11
353
0

Báo cáo khoa học: "An Alignment Method for Noisy Parallel Corpora based on Image Processing Techniques" doc

Ngày tải lên : 17/03/2014, 23:20

8
326
0

Báo cáo khoa học: "A Joint Rule Selection Model for Hierarchical Phrase-based Translation" pptx

Ngày tải lên : 23/03/2014, 16:20

6
314
0

Báo cáo khoa học: "A Generative Entity-Mention Model for Linking Entities with Knowledge Base" doc

Ngày tải lên : 23/03/2014, 16:20

10
414
0

Báo cáo khoa học: "A Discriminative Latent Variable Model for Statistical Machine Translation" pdf

Ngày tải lên : 23/03/2014, 17:20

9
291
0