unsupervised language model adaptation

Tài liệu Báo cáo khoa học: "Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information" doc

Tài liệu Báo cáo khoa học: "Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information" doc

Ngày tải lên : 19/02/2014, 19:20
... transferred across languages, to cross- lingual language modeling and translation lexicon adaptation. Recently, Gong and Zhou (2010) also applied topic modeling into domain adaptation in SMT. ... weblog. According to adaptation emphases, domain adap- tation in SMT can be classified into translation mod- el adaptation and language model adaptation. Here we focus on how to adapt a translation model, which is ... Vogel. 2006. Distributed Language Modeling for N-best List Re-ranking. In Proc. of EMNLP 2006, pages 216-223. Bing Zhao, Matthias Eck and Stephan Vogel. 2004. Language Model Adaptation for Statistical...
  • 10
  • 533
  • 0
Tài liệu Báo cáo khoa học: "Topic Models for Dynamic Translation Model Adaptation" pptx

Tài liệu Báo cáo khoa học: "Topic Models for Dynamic Translation Model Adaptation" pptx

Ngày tải lên : 19/02/2014, 19:20
... 2003). Topic modeling has received some use in SMT, for in- stance Bilingual LSA adaptation (Tam et al., 2007), and the BiTAM model (Zhao and Xing, 2006), which uses a bilingual topic model for ... Korea, 8-14 July 2012. c 2012 Association for Computational Linguistics Topic Models for Dynamic Translation Model Adaptation Vladimir Eidelman Computer Science and UMIACS University of Maryland College ... topic-specific contexts, where topics are induced in an unsupervised way using topic models; this can be thought of as inducing subcorpora for adaptation with- out any human annotation. We use these...
  • 5
  • 532
  • 0
Tài liệu Báo cáo khoa học: "A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation" doc

Tài liệu Báo cáo khoa học: "A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation" doc

Ngày tải lên : 20/02/2014, 04:20
... 5-gram/2-SLM+2-gram/4-SLM+5- gram/PLSA language model improves both signif- icantly. Bear in mind that Charniak et al. (2003) in- tegrated Charniak’s language model with the syntax- based translation model Yamada and ... Large language models in ma- chine translation. The 2007 Conference on Empirical Methods in Natural Language Processing (EMNLP), 858-867. E. Charniak. 2001. Immediate-head parsing for language models. ... Dis- tributed language modeling for N-best list re-ranking. The 2006 Conference on Empirical Methods in Natu- ral Language Processing (EMNLP), 216-223. Y. Zhang, 2008. Structured language models for...
  • 10
  • 567
  • 0
Tài liệu Báo cáo khoa học: "Smoothing a Tera-word Language Model" doc

Tài liệu Báo cáo khoa học: "Smoothing a Tera-word Language Model" doc

Ngày tải lên : 20/02/2014, 09:20
... Goodman. 2001. A bit of progress in language modeling. Computer Speech and Language. R. Kneser and H. Ney. 1995. Improved backing-off for m-gram language modeling. In International Confer- ence ... Bauman Peto. 1995. A hierarchical Dirichlet language model. Natural Lan- guage Engineering, 1(3):1–19. Y.W. Teh. 2006. A hierarchical Bayesian language model based on Pitman-Yor processes. In Proceed- ings ... sure the probabilities are normalized. The interpolated models always incorporate the lower or- der distribution Pr(c|b) whereas the back-off models consider it only when the n-gram abc has not been observed...
  • 4
  • 425
  • 1
Tài liệu Báo cáo khoa học: "A Succinct N-gram Language Model" ppt

Tài liệu Báo cáo khoa học: "A Succinct N-gram Language Model" ppt

Ngày tải lên : 20/02/2014, 09:20
... -gram language models are com- pressed into 10 GB, which is comparable to a lossy representation (Talbot and Brants, 2008). 2 N -gram Language Model We assume a back-off N-gram language model ... language model structure and word iden- tifiers. In Proc. of ICASSP 2003, volume 1. A. Stolcke. 1998. Entropy-based pruning of backoff language models. In Proc. of the ARPA Workshop on Human Language ... representation with block compression. N-gram language models of 42.65GB were compressed to 18.37GB. Finally, the 8-bit quantized N -gram language models are represented by 9.83GB of space. Table...
  • 4
  • 457
  • 0
Tài liệu Báo cáo khoa học: "A Phonotactic Language Model for Spoken Language Identification" pptx

Tài liệu Báo cáo khoa học: "A Phonotactic Language Model for Spoken Language Identification" pptx

Ngày tải lên : 20/02/2014, 15:20
... statistical language modeling, and language identification. A typical LID system is illustrated in Figure 1 (Zissman, 1996), where language dependent voice tokenizers (VT) and lan- guage models ... Tokenization at different resolutions 2.2 n-gram Language Model With the sequence of tokens, we are able to es- timate an n-gram language model (LM) from the statistics. It is generally agreed ... the 1996 NIST Language Recognition Evaluation database. 1 Introduction Spoken language and written language are similar in many ways. Therefore, much of the research in spoken language identification,...
  • 8
  • 436
  • 0
Tài liệu Báo cáo khoa học: "Japanese OCR Error Correction using Character Shape Similarity and Statistical Language Model " pptx

Tài liệu Báo cáo khoa học: "Japanese OCR Error Correction using Character Shape Similarity and Statistical Language Model " pptx

Ngày tải lên : 20/02/2014, 18:20
... efficient. 5 Experiments 5.1 Training Data for the Language Model We used the EDR Japanese Corpus Version 1.0 (EDR, 1991) to train the language model. It is a corpus of approximately 5.1 million ... dictionary. 3.2 Word Model for Unknown Words We defined a statistical word model to assign a rea- sonable word probability to an arbitrary substring in the input sentence. The word model is formally ... correction candidate is selected by the word segmentation algo- rithm using the OCR model and the language model. For simplicity, we will present the method as if it were for an isolated word...
  • 7
  • 472
  • 0
Tài liệu Báo cáo khoa học: "A Structured Language Model" ppt

Tài liệu Báo cáo khoa học: "A Structured Language Model" ppt

Ngày tải lên : 22/02/2014, 03:20
... Derivational Model. In Proceedings of the Human Language Technology Workshop, 272-277. ARPA. Raymond Lau, Ronald Rosenfeld, and Salim Roukos. 1993. Trigger-based language models: a maximum ... 2 P(wk/Wk-ITk 1) = P(wjho) models, re- ferred to as H and h, respectively; h0 is the previous exposed (headword, POS/non-term tag) pair; the parses used in this model were those assigned man- ... word wk and they were implemented using the Maximum Entropy Model- ing Toolkit 1 (Ristad97). The constraint templates in the {W,H} models were: 4 <= <*>_<*> <7>; P- <=...
  • 3
  • 342
  • 0
Báo cáo khoa học: "Intelligent Selection of Language Model Training Data" ppt

Báo cáo khoa học: "Intelligent Selection of Language Model Training Data" ppt

Ngày tải lên : 07/03/2014, 22:20
... domain- specific and non-domain-specifc language models, for each sentence of the text source used to produce the latter language model. We show that this produces better language models, trained on less data, ... approach to statistical language modeling for Chinese. ACM Transactions on Asian Language Informa- tion Processing, 1(1):3–33. Dietrich Klakow. 2000. Selecting articles from the language model training ... Chien, Ker-Jiann Chen, and Lin-Shan Lee. 1997. Chinese language model adaptation based on document classification and multiple domain- specific language models. In EUROSPEECH- 1997, 1463–1466. Hermann...
  • 5
  • 348
  • 0
Báo cáo khoa học: "Optimizing Language Model Information Retrieval System with Expectation Maximization Algorithm" doc

Báo cáo khoa học: "Optimizing Language Model Information Retrieval System with Expectation Maximization Algorithm" doc

Ngày tải lên : 08/03/2014, 01:20
... its original parame- ters are given by the basic language modeling approach calculation. Figure 2. HMM model for EM IR We define our HMM model as a four-tuple, {S,A,B,π}, where S is a ... calculated with the simple language modeling approach. Even if the query term is not in the document, it will be assigned a small value according to the basic language modeling method. The rest ... states take much time for one run, we firstly apply a basic language modeling method to reduce our docu- ment set. This language modeling method will be detailed at Section 3.1. Based on the...
  • 9
  • 317
  • 1
Báo cáo khoa học: "A Discriminative Language Model with Pseudo-Negative Samples" pptx

Báo cáo khoa học: "A Discriminative Language Model with Pseudo-Negative Samples" pptx

Ngày tải lên : 08/03/2014, 02:21
... Discriminative Language Model with Pseudo-Negative samples We propose a novel discriminative language model; a Discriminative Language Model with Pseudo- Negative samples (DLM-PN). In this model, pseudo-negative ... propose a novel discrim- inative language model, which can be ap- plied quite generally. Compared to the well known N-gram language models, dis- criminative language models can achieve more accurate ... correctly. 2 Previous work Probabilistic language models (PLMs) estimate the probability of word strings or sentences. Among these models, N-gram language models (NLMs) are widely used. NLMs approximate...
  • 8
  • 315
  • 0
Báo cáo khoa học: "Language Model Based Arabic Word Segmentation" pdf

Báo cáo khoa học: "Language Model Based Arabic Word Segmentation" pdf

Ngày tải lên : 08/03/2014, 04:22
... involves (i) language model training on a morpheme- segmented corpus, (ii) segmentation of input text into a sequence of morphemes using the language model parameters, and (iii) unsupervised ... inferred. We would like to thank Martin Franz for discussions on language model building, and his help with the use of ViaVoice language model toolkit. References Beesley, K. 1996. Arabic Finite-State ... Prefix*-Stem-Suffix* 3 Morpheme Segmentation 3.1 Trigram Language Model Given an Arabic sentence, we use a trigram language model on morphemes to segment it into a sequence of morphemes...
  • 8
  • 189
  • 0
Báo cáo khoa học: "Automatic Acquisition of Language Model based on Head-Dependent Relation between Words" pdf

Báo cáo khoa học: "Automatic Acquisition of Language Model based on Head-Dependent Relation between Words" pdf

Ngày tải lên : 08/03/2014, 05:21
... Preliminary experiments We have experimented with three language models, tri-gram model (TRI), bi-gram model (BI), and the proposed model (DEP) on a raw corpus extracted from KAIST corpus ... useful than the naive word sequences of n-gram, for language modeling. We are planning to experiment the perfor- mance of the proposed language model for large corpus, for various domains, and ... Based n-gram Models of Natural Language& quot;. Computational Linguistics, 18(4):467-480. C. Chang and C. Chen. 1996. "Application Is- sues of SA-class Bigram Language Models"....
  • 5
  • 334
  • 0
Báo cáo khoa học: "A Preference-first Language Processor Integrating the Unification Grammar and Markov Language Model for Speech Recognition-ApplicationS" potx

Báo cáo khoa học: "A Preference-first Language Processor Integrating the Unification Grammar and Markov Language Model for Speech Recognition-ApplicationS" potx

Ngày tải lên : 08/03/2014, 07:20
... The language processor consists of a language model and a parser. The language model properly integrates the unification grammar and the Markov language model, while the parser is defined based ... language processor has been developed and tested on a small lexicon, a Markov language model, and a simple set of unification grammar rules for the Chinese language, although the present model ... summarized. The Laneua~e Model The goal of the language model is to participate in the selection of candidate constituents for a sentence to be identified. The proposed language model is composed...
  • 6
  • 392
  • 0
Báo cáo khoa học: "Combining a Statistical Language Model with Logistic Regression to Predict the Lexical and Syntactic Difficulty of Texts for FFL" potx

Báo cáo khoa học: "Combining a Statistical Language Model with Logistic Regression to Predict the Lexical and Syntactic Difficulty of Texts for FFL" potx

Ngày tải lên : 08/03/2014, 21:20
... a statistical language model and a measure of tense difficulty. 4.1 The language model The lexical difficulty of a text is quite an elaborate phenomenon to parameterise. The logistic regres- sion models ... linguistic unit to consider. The equations introduced above use 21 6.1 The language model: probabilities and smoothing For our language model, we need a list of French lemmas with their frequencies of ... B2 C1 C2 Mean PO model 91% 91% 67% 68% 53% 55% 56% 86% 68% 70% MLR model 93% 90% 69% 51% 59% 56% 64% 88% 73% 71% Table 2: Mean adjacent accuracy per level for PO model and MLR model (on the test...
  • 9
  • 514
  • 0