discriminative n gram language modeling

Tài liệu Báo cáo khoa học: "A Succinct N-gram Language Model" ppt

Tài liệu Báo cáo khoa học: "A Succinct N-gram Language Model" ppt

Ngày tải lên : 20/02/2014, 09:20
... N -gram com- pression tasks achieved a significant com- pression rate without any loss. 1 Introduction There has been an increase in available N -gram data and a large amount of web-scaled N- gram data ... Compressing trigram language models with Golomb coding. In Proc. of EMNLP-CoNLL 2007. O. Delpratt, N. Rahman, and R. Raman. 2006. Engi- neering the LOUDS succinct tree representation. In Proc. ... N- gram counts. By using 8-bit floating point quantization 1 , N -gram language models are com- pressed into 10 GB, which is comparable to a lossy representation (Talbot and Brants, 2008). 2 N -gram...
  • 4
  • 457
  • 0
Tài liệu Báo cáo khoa học: "Improved Smoothing for N-gram Language Models Based on Ordinary Counts" doc

Tài liệu Báo cáo khoa học: "Improved Smoothing for N-gram Language Models Based on Ordinary Counts" doc

Ngày tải lên : 20/02/2014, 09:20
... this makes Kneser-Ney smoothing inappropri- ate or inconvenient. In this paper, we in- troduce a new smoothing method based on ordinary counts that outperforms all of the previous ordinary-count methods ... new method eliminating most of the gap between Kneser-Ney and those methods. 1 Introduction Statistical language models are potentially useful for any language technology task that produces natural -language ... currently be the best approach when language models based on ordinary counts are desired. References Chen, Stanley F., and Joshua Goodman. 1998. An empirical study of smoothing techniques for language...
  • 4
  • 365
  • 0
Báo cáo khoa học: "Faster and Smaller N -Gram Language Models" pptx

Báo cáo khoa học: "Faster and Smaller N -Gram Language Models" pptx

Ngày tải lên : 07/03/2014, 22:20
... Each node in the tree encodes a word, and paths in the tree correspond to n- grams in the collection. Tries ensure that each n- gram prefix is represented only once, and are very efficient when n- grams ... number of keys and values needed for n- gram language modeling, generic implementations do not work efficiently “out of the box.” In this section, we will review existing techniques for encoding ... scalable decoder for parsing-based machine translation with equivalent language model state maintenance. In Pro- ceedings of the Second Workshop on Syntax and Struc- ture in Statistical Translation. Zhifei Li,...
  • 10
  • 463
  • 0
Tài liệu Báo cáo khoa học: "Discriminative Lexicon Adaptation for Improved Character Accuracy – A New Direction in Chinese Language Modeling" pptx

Tài liệu Báo cáo khoa học: "Discriminative Lexicon Adaptation for Improved Character Accuracy – A New Direction in Chinese Language Modeling" pptx

Ngày tải lên : 20/02/2014, 07:20
... statis- tical language modeling for Chinese. ACM Trans- action on Asian Language Information Processing, 1(1):3–33. Jianfeng Gao, Mu Li, Andi Wu, and Chang-Ning Huang. 2004. Chinese word segmentation: A ... characters in the lexi- con and using the training data to alter the current lexicon in each iteration. This is also an interesting direction. References Maximilian Bisani and Hermann Ney. 2005. Open vo- cabulary ... 16th International Conference on Computational Linguis- tic, pages 200–203. Kae-Cherng Yang, Tai-Hsuan Ho, Lee-Feng Chien, and Lin-Shan Lee. 1998. Statistics-based segment pat- tern lexicon: A new...
  • 9
  • 466
  • 0
Tài liệu Báo cáo khoa học: "Discriminative Syntactic Language Modeling for Speech Recognition" pdf

Tài liệu Báo cáo khoa học: "Discriminative Syntactic Language Modeling for Speech Recognition" pdf

Ngày tải lên : 20/02/2014, 15:20
... linguistically motivated language model in conver- sational speech recognition. In Proc. ICASSP. Wen Wang. 2003. Statistical parsing and language model- ing based on constraint dependency grammar. ... fields. In Proceedings of the Human Language Technology Conference and Meeting of the North American Chapter of the Association for Computational Lin- guistics (HLT-NAACL), Edmonton, Canada. Andreas ... 2001. Whole-sentence exponential language models: a vehicle for linguistic-statistical integration. In Computer Speech and Language. Fei Sha and Fernando Pereira. 2003. Shallow parsing with conditional random...
  • 8
  • 409
  • 0
Tài liệu Báo cáo khoa học: "Large-Scale Syntactic Language Modeling with Treelets" docx

Tài liệu Báo cáo khoa học: "Large-Scale Syntactic Language Modeling with Treelets" docx

Ngày tải lên : 19/02/2014, 19:20
... positive data alone. We also show fluency improvements in a pre- liminary machine translation reranking experiment. 2 Treelet Language Modeling The common denominator of most n- gram language models ... despite training on positive data alone. We also show fluency improvements in a pre- liminary machine translation experiment. 1 Introduction N- gram language models are a central component of all ... Google Inc. 2007. Large lan- guage models in machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Eugene Charniak and Mark Johnson. 2005....
  • 10
  • 463
  • 0
Tài liệu Báo cáo khoa học: "Creating Robust Supervised Classifiers via Web-Scale N-gram Data" pdf

Tài liệu Báo cáo khoa học: "Creating Robust Supervised Classifiers via Web-Scale N-gram Data" pdf

Ngày tải lên : 20/02/2014, 04:20
... Workshop on Natural Language Generation. Natalia N. Modjeska, Katja Markert, and Malvina Nis- sim. 2003. Using the Web in machine learning for other-anaphora resolution. In EMNLP. Preslav Nakov and ... Search engine statistics beyond the n- gram: Application to noun compound bracketing. In CoNLL. Preslav Ivanov Nakov. 2007. Using the Web as an Im- plicit Training Set: Application to Noun Compound Syntax ... order- ing, spelling correction, noun compound bracketing, and verb part-of-speech dis- ambiguation. More importantly, when op- erating on new domains, or when labeled training data is not plentiful,...
  • 10
  • 359
  • 0
Tài liệu Báo cáo khoa học: "An ERP-based Brain-Computer Interface for text entry using Rapid Serial Visual Presentation and Language Modeling" ppt

Tài liệu Báo cáo khoa học: "An ERP-based Brain-Computer Interface for text entry using Rapid Serial Visual Presentation and Language Modeling" ppt

Ngày tải lên : 20/02/2014, 05:20
... Transactions on Reha- bilitation Engineering, 8(2):216–219. B. Roark, J. de Villiers, C. Gibbons, and M. Fried-Oken. 2010. Scanning methods and language modeling for binary switch typing. In Proceedings ... brain-computer interface. Neural Systems and Rehabilitation Engineering, IEEE Trans- actions on, 13(1):89–98. M.S. Treder and B. Blankertz. 2010. (C) overt atten- tion and visual speller design in an ERP-based ... methods integrating language modeling into grid scanning. 2 RSVP based BCI and ERP Classification RSVP is an experimental psychophysics technique in which visual stimulus sequences are displayed on a...
  • 6
  • 551
  • 0
Tài liệu Báo cáo khoa học: "A Unified Framework for Automatic Evaluation using N-gram Co-Occurrence Statistics" pptx

Tài liệu Báo cáo khoa học: "A Unified Framework for Automatic Evaluation using N-gram Co-Occurrence Statistics" pptx

Ngày tải lên : 20/02/2014, 16:20
... ∑∑ ∑∑ ∈∈ ∈∈ = }{),( }{),( )( )( )( CandidatesCnCSngram CandidatesCnCSngram clip ngramCount ngramCount nP where Count(ngram) is the number of n- gram counts, and Count clip (ngram) is the maximum number of co-occurrences of ngram ... ),( )( )( )( ferencesRnRSngram ferencesRnRSngram clip ngramCount ngramCount nR where, as before, Count(ngram) is the number of n- gram counts, and Count clip (ngram) is the maximum number of co-occurrences of ngram in the reference answer and its corresponding ... using ST and eliminating the unigrams found in SW. We therefore define a recall score as: ∑∑ ∑ ∑ ∈∈ ∈∈ = }{Re ),( }{Re ),( )( )( )( ferencesRnRSngram ferencesRnRSngram clip ngramCount ngramCount nR ...
  • 8
  • 462
  • 0
Báo cáo khoa học: "A Scalable Probabilistic Classifier for Language Modeling" pdf

Báo cáo khoa học: "A Scalable Probabilistic Classifier for Language Modeling" pdf

Ngày tải lên : 07/03/2014, 22:20
... Proceedings of the 5th International Con- ference on Spoken Language Processing, pages 1694– 1698, Sydney, Australia. R. Kneser and H. Ney. 1995. Improved Backing-off for M -Gram Language Modeling. In ... Categorization Research. Journal of Machine Learning Research, 5:361–397. A. Mnih and G. Hinton. 2008. A Scalable Hierarchical Distributed Language Model. In Advances in Neural Information Processing Systems ... 21. H. Ney, U. Essen, and R. Kneser. 1994. On Structur- ing Probabilistic Dependences in Stochastic Language Modeling. Computer, Speech and Language, 8:1–38. B. Roark, M. Saraclar, and M. Collins....
  • 6
  • 350
  • 0
Báo cáo khoa học: "An Efficient Indexer for Large N-Gram Corpora" docx

Báo cáo khoa học: "An Efficient Indexer for Large N-Gram Corpora" docx

Ngày tải lên : 07/03/2014, 22:20
... Melbourne, Australia. R. Kneser and H. Ney. 1995. Improved backing-off for n- gram language modeling. In Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 Interna- tional Conference on, ... definition, each internal node except the root can have any number of keys in the range [v, 2v], and the root must have at least one key. Finally, an internal node with k keys has k + 1 children. 4.2 ... Foundation CA- REER award #0747340 and IIS awards #0917170 and #1018613. Any opinions, findings, and conclu- sions or recommendations expressed in this material are those of the authors and do not...
  • 6
  • 320
  • 0
Báo cáo khoa học: "Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining" ppt

Báo cáo khoa học: "Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining" ppt

Ngày tải lên : 16/03/2014, 19:20
... integrated syntactic language modeling. Ph.D. thesis, Brown University. L. Huang and K. Sagae. 2010. Dynamic Programming for Linear-Time Incremental Parsing. In Proceedings of ACL. Zhongqiang ... Discrimina- tive syntactic language modeling for speech recogni- tion. In ACL. Denis Filimonov and Mary Harper. 2009. A joint language model with fine-grain syntactic tags. In EMNLP. Yoav Goldberg and Michael ... tuning dev04f BN data 2.5 hrs Supervised training: dep. parser, POS tagger Ontonotes BN treebank+ WSJ Penn treebank 1.3m words, 59k sent. Supervised training: constituent parser Ontonotes BN...
  • 9
  • 319
  • 0
Báo cáo khoa học: "Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation" pdf

Báo cáo khoa học: "Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation" pdf

Ngày tải lên : 17/03/2014, 00:20
... alignment in statistical trans- lation. In Proceedings of COLING, pages 836–841. Ying Zhang and Stephan Vogel. 2004. Measuring con- fidence intervals for the machine translation evalua- tion metrics. ... translation performance significantly on a large-scale Arabic-to-English MT task. 1 Introduction Significant progress has been made in statisti- cal machine translation (SMT) in recent years. Among ... metrics. In Proceedings of The 10th International Conference on Theoretical and Methodological Issues in Machine Translation. Bing Zhao and Shengyuan Chen. 2009. A simplex armijo downhill algorithm...
  • 5
  • 259
  • 0
Báo cáo khoa học: "Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling" doc

Báo cáo khoa học: "Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling" doc

Ngày tải lên : 17/03/2014, 01:20
... token rep- resenting a sentence boundary in language model- 101 Mark Johnson and Sharon Goldwater. 2009. Im- proving nonparameteric Bayesian inference: exper- iments on unsupervised word segmentation ... arbitrary language, without any “word” indications. 1 Introduction “Word” is no trivial concept in many languages. Asian languages such as Chinese and Japanese have no explicit word boundaries, ... leverages dy- namic programming for inference. In Section 5 we describe experiments on the standard datasets in Chinese and Japanese in addition to English pho- netic transcripts, and semi-supervised...
  • 9
  • 238
  • 0
Báo cáo khoa học: "Grounded Language Modeling for Automatic Speech Recognition of Sports Video" doc

Báo cáo khoa học: "Grounded Language Modeling for Automatic Speech Recognition of Sports Video" doc

Ngày tải lên : 17/03/2014, 02:20
... grounded language modeling, an extension of tradition lan- guage modeling in which the probability of a word is conditioned not only on the previous word(s) but also on the non-linguistic context ... In- ternational Conference on Knowledge Discovery and Data Mining. Seattle, Washington. Stolcke, A., (2002). SRILM - An Extensible Language Modeling Toolkit, in Proc. Intl. Conf. Spoken Lan- guage ... baseline compari- sons) are generated with the SRI language model- ing toolkit (Stolcke, 2002) using Chen and Goodman's modified Kneser-Ney discounting and interpolation (Chen and Goodman,...
  • 9
  • 395
  • 0

Xem thêm