ngram language models based

Tài liệu Báo cáo khoa học: "Improved Smoothing for N-gram Language Models Based on Ordinary Counts" doc

Tài liệu Báo cáo khoa học: "Improved Smoothing for N-gram Language Models Based on Ordinary Counts" doc

Ngày tải lên : 20/02/2014, 09:20
... Kneser-Ney and those methods. 1 Introduction Statistical language models are potentially useful for any language technology task that produces natural -language text as a final (or intermediate) output. ... best approach when language models based on ordinary counts are desired. References Chen, Stanley F., and Joshua Goodman. 1998. An empirical study of smoothing techniques for language modeling. ... 349–352, Suntec, Singapore, 4 August 2009. c 2009 ACL and AFNLP Improved Smoothing for N-gram Language Models Based on Ordinary Counts Robert C. Moore Chris Quirk Microsoft Research Redmond, WA 98052,...
  • 4
  • 365
  • 0
Tài liệu Báo cáo khoa học: "Incremental Syntactic Language Models for Phrase-based Translation" pptx

Tài liệu Báo cáo khoa học: "Incremental Syntactic Language Models for Phrase-based Translation" pptx

Ngày tải lên : 20/02/2014, 04:20
... sequence models as language models. Modern phrase -based translation using large scale n-gram language models generally performs well in terms of lexical choice, but still often produces ungrammatical ... LMs. Syntactic language models have also been explored with tree -based translation models. Charniak et al. (2003) use syntactic lan- guage models to rescore the output of a tree -based translation ... to incorporate large- scale n-gram language models in conjunction with incremental syntactic language models. The added decoding time cost of our syntactic language model is very high. By increasing...
  • 12
  • 510
  • 0
Tài liệu Báo cáo khoa học: "The impact of language models and loss functions on repair disfluency detection" pptx

Tài liệu Báo cáo khoa học: "The impact of language models and loss functions on repair disfluency detection" pptx

Ngày tải lên : 20/02/2014, 04:20
... language models trained from text or speech corpora of vari- ous genres and sizes. The largest available language models are based on written text: we investigate the effect of written text language models ... dif- ferences among the different language models when extended features are present are relatively small. We assume that much of the information expressed in the language models overlaps with the lexical ... f-score outperform state-of-the-art models re- 709 mary corpus for our model. The language model part of the noisy channel model already uses a bi- gram language model based on Switchboard, but in the...
  • 9
  • 609
  • 0
Tài liệu Báo cáo khoa học: "An Empirical Investigation of Discounting in Cross-Domain Language Models" ppt

Tài liệu Báo cáo khoa học: "An Empirical Investigation of Discounting in Cross-Domain Language Models" ppt

Ngày tải lên : 20/02/2014, 04:20
... of English Bigrams. Computer Speech & Language, 5(1):19–54. Joshua Goodman. 2001. A Bit of Progress in Language Modeling. Computer Speech & Language, 15(4):403– 434. Bo-June (Paul) Hsu ... pages 220–224, July. Robert C. Moore and Chris Quirk. 2009. Improved Smoothing for N-gram Language Models Based on Ordinary Counts. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, ... Ap- proach to Adaptive Statistical Language Modeling. Computer, Speech & Language, 10:187–228. Yee Whye Teh. 2006. A Hierarchical Bayesian Lan- guage Model Based On Pitman-Yor Processes. In...
  • 6
  • 444
  • 0
Tài liệu Báo cáo khoa học: "Reading Level Assessment Using Support Vector Machines and Statistical Language Models" pdf

Tài liệu Báo cáo khoa học: "Reading Level Assessment Using Support Vector Machines and Statistical Language Models" pdf

Ngày tải lên : 20/02/2014, 15:20
... statistical language models. In this paper, we also use support vector machines to combine features from tradi- tional reading level measures, statistical language models, and other language pro- cessing ... 40%. The curves for bigram and unigram models have similar shapes, but the trigram models outperform the lower-order models. Error rates for the bigram models range from 37-45% and the unigram ... use scores from language models as features in another classifier (e.g. an SVM). For ex- ample, perplexity (P P) is an information-theoretic measure often used to assess language models: P P = 2 H(t|c) ,...
  • 8
  • 446
  • 0
Tài liệu Báo cáo khoa học: "Generating statistical language models from interpretation grammars in dialogue systems" potx

Tài liệu Báo cáo khoa học: "Generating statistical language models from interpretation grammars in dialogue systems" potx

Ngày tải lên : 22/02/2014, 02:20
... comparison of in- grammar recognition performance. 3 Language modelling To generate the different trigram language models we used the SRI language modelling toolkit (Stol- cke, 2002) with Good-Turing ... Spe- cific Language Models for Spoken Language Un- derstanding. In Proceedings of SPECOM’97, Cluj- Napoca, Romania, pp. 51–56. Bangalore S. and Johnston M. 2004. Balancing Data- Driven And Rule -Based ... statistical language models (DM-SLMs) by using GF to generate all utterances that are specific to certain dialogue moves from our in- terpretation grammar. In this way we can pro- duce models that...
  • 8
  • 381
  • 0
Tài liệu Báo cáo khoa học: "Web augmentation of language models for continuous speech recognition of SMS text messages" docx

Tài liệu Báo cáo khoa học: "Web augmentation of language models for continuous speech recognition of SMS text messages" docx

Ngày tải lên : 22/02/2014, 02:20
... Kneser- Ney smoothed n-gram models. IEEE Transac- tions on Audio, Speech and Language Processing, 15(5):1617–1624. A. Stolcke. 1998. Entropy -based pruning of backoff language models. In Proc. DARPA ... 2007. Large language models in machine translation. In Proceedings of the 2007 Joint Conference on Empirical Meth- ods in Natural Language Processing and Com- putational Natural Language Learning ... were selected for each language. The adaptation was thought to take place off-line on a server. 3.2.1 Data sets For each language, the adaptation takes place on two baseline models, which are the...
  • 9
  • 301
  • 0
Báo cáo khoa học: "A Framework for Figurative Language Detection Based on Sense Differentiation" pptx

Báo cáo khoa học: "A Framework for Figurative Language Detection Based on Sense Differentiation" pptx

Ngày tải lên : 07/03/2014, 22:20
... examine one of such problems, the problem of automatic figurative language use detection. We propose a framework for figurative language detection based on the idea of sense differentiation. Then, we de- scribe ... Con- ference on Empirical Methods in Natural Language Processing, pp. 315-323. Rada Mihalcea, Courtney Corley and Carlo Strappa- rava. 2006. Corpus -based and Knowledge -based Measures of Text Semantic Similarity. ... are de- veloping do not work with this type of metaphors. Our approach to figurative language detection is based on the following idea: the fact that the sense of a word significantly differs from...
  • 6
  • 540
  • 0
Báo cáo khoa học: "The use of formal language models in the typology of the morphology of Amerindian languages" potx

Báo cáo khoa học: "The use of formal language models in the typology of the morphology of Amerindian languages" potx

Ngày tải lên : 07/03/2014, 22:20
... grammars for modeling agglutination in this language, but first we will present the for- mer class of languages and its acceptor automata. 3.1 Linear context free languages and two-taped nondeterministic ... 2010. c 2010 Association for Computational Linguistics The use of formal language models in the typology of the morphology of Amerindian languages Andr ´ es Osvaldo Porta Universidad de Buenos Aires hugporta@yahoo.com.ar Abstract The ... natural representa- tion in terms of linear context-free languages. 2 Quichua Santiague ˜ no The quichua santiague˜no is a language of the Quechua language family. It is spoken in the San- tiago del...
  • 6
  • 439
  • 0
Báo cáo khoa học: "Faster and Smaller N -Gram Language Models" pptx

Báo cáo khoa học: "Faster and Smaller N -Gram Language Models" pptx

Ngày tải lên : 07/03/2014, 22:20
... novel language model caching technique that improves the query speed of our language models (and SRILM) by up to 300%. 1 Introduction For modern statistical machine translation systems, language models ... with two different language models. Our first language model, WMT2010, was a 5- gram Kneser-Ney language model which stores probability/back-off pairs as values. We trained this language model on ... value ranks for a given language model will vary – we will refer to this variable as v . 2.2 Trie -Based Language Models The data structure of choice for the majority of modern language model implementations...
  • 10
  • 463
  • 0
Báo cáo khoa học: "Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers" ppt

Báo cáo khoa học: "Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers" ppt

Ngày tải lên : 07/03/2014, 22:20
... explore a dependency language model to improve translation quality. To some ex- tent, these syntactically-informed language models are consistent with syntax -based translation models in capturing ... scoring based on back- ward language models. In Proceedings of ICASSP, pages 221–224, Orlando, FL, April. Ahmad Emami, Kishore Papineni, and Jeffrey Sorensen. 2007. Large-scale distributed language ... maximum entropy based language model as features. The trigger pairs are selected accord- ing to their mutual information. Zhou (2004) also propose an enhanced language model (MI -Ngram) which consists...
  • 10
  • 415
  • 0
Báo cáo khoa học: "Randomized Language Models via Perfect Hash Functions" pptx

Báo cáo khoa học: "Randomized Language Models via Perfect Hash Functions" pptx

Ngày tải lên : 08/03/2014, 01:20
... Entropy -based pruning of back- off language models. In Proc. DARPA Broadcast News Transcription and Understanding Workshop, pages 270–274. D. Talbot and M. Osborne. 2007a. Randomised language modelling ... processed using any con- ventional model reduction technique. 3 Perfect Hash -based Language Models Our randomized LM is based on the Bloomier filter (Chazelle et al., 2004). We assume the n-grams ... 2007. Compressing trigram language models with golomb coding. In Proceedings of EMNLP-CoNLL 2007, Prague, Czech Republic, June. P. Clarkson and R. Rosenfeld. 1997 . Statistical language modeling using...
  • 9
  • 273
  • 0
Báo cáo khoa học: "Generalized Algorithms for Constructing Statistical Language Models" pdf

Báo cáo khoa học: "Generalized Algorithms for Constructing Statistical Language Models" pdf

Ngày tải lên : 08/03/2014, 04:22
... . Class -based models. In many applications, it is nat- ural and convenient to construct class -based language models, that is models based on classes of words (Brown et al., 1992). Such models ... model represented with failure transitions. 5 General class -based language modeling Standard class -based or phrase -based language models are based on simple classes often reduced to a short list of ... Scalable backoff language models. In Proceedings of the International Con- ference on Spoken Language Processing (ICSLP). Andreas Stolcke. 1998. Entropy -based pruning of backoff lan- guage models. In...
  • 8
  • 389
  • 0
Báo cáo khoa học: "Language Model Based Arabic Word Segmentation" pdf

Báo cáo khoa học: "Language Model Based Arabic Word Segmentation" pdf

Ngày tải lên : 08/03/2014, 04:22
... applicable to various language families including agglutinative languages (Korean, Turkish, Finnish), highly inflected languages (Russian, Czech) as well as semitic languages (Arabic, Hebrew). ... many highly inflected languages provided that one can create a small manually segmented corpus of the language of interest. 1 Introduction Morphologically rich languages like Arabic ... inferred. We would like to thank Martin Franz for discussions on language model building, and his help with the use of ViaVoice language model toolkit. References Beesley, K. 1996. Arabic...
  • 8
  • 189
  • 0
Báo cáo khoa học: "Automatic Acquisition of Language Model based on Head-Dependent Relation between Words" pdf

Báo cáo khoa học: "Automatic Acquisition of Language Model based on Head-Dependent Relation between Words" pdf

Ngày tải lên : 08/03/2014, 05:21
... "Class- Based n-gram Models of Natural Language& quot;. Computational Linguistics, 18(4):467-480. C. Chang and C. Chen. 1996. "Application Is- sues of SA-class Bigram Language Models& quot;. ... Trigram Backoff Language Models& quot;. Techni- cal Report CMU-CS-96-139, Carnegie Mellon University. S. Sneff. 1992. "TINA: A natural language sys- tem for spoken language applications". ... Acquisi- tion of Language Models for Speech Recog- nition". Master's thesis, Massachusetts Insti- tute of Technology. M. Meteer and J.R. Rohlicek. 1993. "Statis- tical Language Modeling...
  • 5
  • 334
  • 0