0

character level language models

Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Reading Level Assessment Using Support Vector Machines and Statistical Language Models" pdf

Báo cáo khoa học

... statistical language models. In this paper, we also use support vectormachines to combine features from tradi-tional reading level measures, statistical language models, and other language pro-cessing ... use scores from language models asfeatures in another classifier (e.g. an SVM). For ex-ample, perplexity (P P) is an information-theoreticmeasure often used to assess language models: P P = 2H(t|c), ... 40%.The curves for bigram and unigram models havesimilar shapes, but the trigram models outperformthe lower-order models. Error rates for the bigram models range from 37-45% and the unigram...
  • 8
  • 446
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Incremental Syntactic Language Models for Phrase-based Translation" pptx

Báo cáo khoa học

... to incorporate large-scale n-gram language models in conjunction withincremental syntactic language models. The added decoding time cost of our syntactic language model is very high. By increasing ... trans-lation has effectively used n-gram word sequence models as language models. Modern phrase-based translation using large scalen-gram language models generally performs wellin terms of lexical ... usesupertag n-gram LMs. Syntactic language models have also been explored with tree-based translation models. Charniak et al. (2003) use syntactic lan-guage models to rescore the output of a...
  • 12
  • 510
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "The impact of language models and loss functions on repair disfluency detection" pptx

Báo cáo khoa học

... language models trained from text or speech corpora of vari-ous genres and sizes. The largest available language models are based on written text: we investigate theeffect of written text language models ... dif-ferences among the different language models whenextended features are present are relatively small.We assume that much of the information expressedin the language models overlaps with the lexical ... information fromthe external language models by defining a rerankerfeature for each external language model. The valueof this feature is the log probability assigned by the language model to the candidate...
  • 9
  • 609
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "An Empirical Investigation of Discounting in Cross-Domain Language Models" ppt

Báo cáo khoa học

... of EnglishBigrams. Computer Speech & Language, 5(1):19–54.Joshua Goodman. 2001. A Bit of Progress in Language Modeling. Computer Speech & Language, 15(4):403–434.Bo-June (Paul) Hsu ... techniques, from interpolation at either the count level or the model level (Bacchiani and Roark, 2003;Bacchiani et al., 2006) to using explicit models ofsyntax or semantics. Hsu and Glass (2008) ... Association for Computational LinguisticsAn Empirical Investigation of Discountingin Cross-Domain Language Models Greg Durrett and Dan KleinComputer Science DivisionUniversity of California, Berkeley{gdurrett,klein}@cs.berkeley.eduAbstractWe...
  • 6
  • 444
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Improved Smoothing for N-gram Language Models Based on Ordinary Counts" doc

Báo cáo khoa học

... Kneser-Ney andthose methods.1 IntroductionStatistical language models are potentially usefulfor any language technology task that producesnatural -language text as a final (or intermediate)output. ... perplexity of any known methodfor estimating N-gram language models. Kneser-Ney smoothing, however, requiresnonstandard N-gram counts for the lower-order models used to smooth the highest-order model. ... best approach when language models based on ordinary counts are desired.ReferencesChen, Stanley F., and Joshua Goodman. 1998.An empirical study of smoothing techniques for language modeling....
  • 4
  • 365
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Generating statistical language models from interpretation grammars in dialogue systems" potx

Báo cáo khoa học

... comparison of in-grammar recognition performance.3 Language modellingTo generate the different trigram language models we used the SRI language modelling toolkit (Stol-cke, 2002) with Good-Turing ... decades of statistical language modeling: Where do we go from here? In Proceed-ings of IEEE:88(8).Rosenfeld R. 2000. Incorporating Linguistic Structureinto Statistical Language Models. In PhilosophicalTransactions ... statistical language models (DM-SLMs)by using GF to generate all utterances that arespecific to certain dialogue moves from our in-terpretation grammar. In this way we can pro-duce models that...
  • 8
  • 381
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Web augmentation of language models for continuous speech recognition of SMS text messages" docx

Báo cáo khoa học

... 2007. Large language models in machine translation. In Proceedingsof the 2007 Joint Conference on Empirical Meth-ods in Natural Language Processing and Com-putational Natural Language Learning ... Kneser-Ney smoothed n-gram models. IEEE Transac-tions on Audio, Speech and Language Processing,15(5):1617–1624.A. Stolcke. 1998. Entropy-based pruning of backoff language models. In Proc. DARPA ... 8 billion.3 Speech Recognition ExperimentsWe have trained language models on the in-domain data together with web data, and these models have been used in speech recognition ex-periments....
  • 9
  • 301
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "The use of formal language models in the typology of the morphology of Amerindian languages" potx

Báo cáo khoa học

... grammars for modeling agglutinationin this language, but first we will present the for-mer class of languages and its acceptor automata.3.1 Linear context free languages andtwo-taped nondeterministic ... example the Guarani language presents nasal harmony which expands from theroot to both suffixes and prefixes (Krivoshein,1994). This kind of characterization can havesome value in language classification ... 2010.c2010 Association for Computational LinguisticsThe use of formal language models in the typology of the morphology ofAmerindian languagesAndr´es Osvaldo PortaUniversidad de Buenos Aireshugporta@yahoo.com.arAbstractThe...
  • 6
  • 439
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Faster and Smaller N -Gram Language Models" pptx

Báo cáo khoa học

... novel language modelcaching technique that improves the queryspeed of our language models (and SRILM)by up to 300%.1 IntroductionFor modern statistical machine translation systems, language models ... with two different language models. Our first language model, WMT2010, was a 5-gram Kneser-Ney language model which storesprobability/back-off pairs as values. We trained this language model on ... and Smaller N -Gram Language Models Adam Pauls Dan KleinComputer Science DivisionUniversity of California, Berkeley{adpauls,klein}@cs.berkeley.eduAbstractN-gram language models are a major...
  • 10
  • 463
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers" ppt

Báo cáo khoa học

... or even trillions of English words,huge language models are built in a distributed man-ner (Zhang et al., 2006; Brants et al., 2007). Such language models yield better translation results butat ... explore a dependency language model to improve translation quality. To some ex-tent, these syntactically-informed language models are consistent with syntax-based translation models in capturing ... integrate backward n-grams and mu-tual information (MI) triggers into language models in SMT.In conventional n-gram language models, we lookat the preceding n − 1 words when calculating theprobability...
  • 10
  • 415
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Randomized Language Models via Perfect Hash Functions" pptx

Báo cáo khoa học

... (lossless) lan-guages models and our randomized language model.Note that the standard practice of measuring per-plexity is not meaningful here since (1) for efficientcomputation, the language model ... 2007.Compressing trigram language models with golombcoding. In Proceedings of EMNLP-CoNLL 2007,Prague, Czech Republic, June.P. Clarkson and R. Rosenfeld. 1997 . Statistical language modeling using ... pruning of back-off language models. In Proc. DARPA Broadcast NewsTranscription and Understanding Workshop, pages270–274.D. Talbot and M. Osborne. 2007a. Randomised language modelling for...
  • 9
  • 273
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Generalized Algorithms for Constructing Statistical Language Models" pdf

Báo cáo khoa học

... .Class-based models. In many applications, it is nat-ural and convenient to construct class-based language models, that is models based on classes of words (Brownet al., 1992). Such models are ... experi-mental results demonstrating its efficiency.Representation of language models by WFAs. Clas-sical-gram language models admit a natural representa-tion by WFAs in which each state encodes ... re-lated to the construction of language models. We presentnew and efficient algorithms to address these more gen-eral problems.Counting. Classical language models are constructedby deriving...
  • 8
  • 389
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Cutting the Long Tail: Hybrid Language Models for Translation Style Adaptation" doc

Báo cáo khoa học

... different levels of lan-guage modeling: from a domain-generic word- level LM to a lexically anchored POS -level LM.4.2 Handling morphologyToken frequency-based measures may not be suit-able for languages ... Hoang. 2007. Factoredtranslation models. In Proceedings of the 2007Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), ... Monz. 2011. Statistical Machine Translationwith Local Language Models. In Proceedings of the2011 Conference on Empirical Methods in Natural Language Processing, pages 869–879, Edinburgh,Scotland,...
  • 10
  • 335
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Deciphering Foreign Language by Combining Language Models and Context Vectors" pdf

Báo cáo khoa học

... Computational LinguisticsDeciphering Foreign Language by Combining Language Models andContext VectorsMalte Nuhn and Arne Mauser∗and Hermann NeyHuman Language Technology and Pattern Recognition ... to universally communi-cate in all languages. In these visions, even previ-ously unknown languages can be learned automati-cally from analyzing foreign language input.In this work, we attempt ... to learn statistical trans-lation models from only monolingual data in thesource and target language. The reasoning behindthis idea is that the elements of languages share sta-tistical similarities...
  • 9
  • 352
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Automatic Evaluation of Chinese Translation Output: Word-Level or Character-Level" doc

Báo cáo khoa học

... Chinese Lexical Analysis 163 level metrics. For BLEU and TER, character- level metrics yield up to 6−9% improvement over word- level metrics. This means the character- level me-trics reduce about ... System -level correlation on NIST’08 EC. 5 Analysis We have analyzed the reasons why character- level metrics better correlate with human assessment than word -level metrics. Compared to word -level ... To summarize, character- level metrics can capture more synonym matches and the resulting segmentation into characters is guaranteed to be consistent, which makes character- level metrics more...
  • 6
  • 344
  • 1

Xem thêm