0

chinese word segmentation and pos tagging

Báo cáo khoa học:

Báo cáo khoa học: "An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging" docx

Báo cáo khoa học

... for joint chinese word segmentation and part-of-speech tagging. InProceedings of ACL.Wenbin Jiang, Haitao Mi, and Qun Liu. 2008b. Word lattice reranking for chinese word segmentation and part-of-speech ... ACL and AFNLPAn Error-Driven Word- Character Hybrid Modelfor Joint Chinese Word Segmentation and POS Tagging Canasai Kruengkrai†‡ and Kiyotaka Uchimoto‡ and Jun’ichi Kazama‡Yiou Wang‡ and ... discriminative word- character hybrid model for joint Chi-nese word segmentation and POS tagging. Our word- character hybrid model offershigh performance since it can handle bothknown and unknown words....
  • 9
  • 338
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Joint Word Segmentation and POS Tagging using a Single Perceptron" docx

Báo cáo khoa học

... UK{yue.zhang,stephen.clark}@comlab.ox.ac.ukAbstractFor Chinese POS tagging, word segmentation is a preliminary step. To avoid error propa-gation and improve segmentation by utilizing POS information, segmentation and tagging can be ... proposed a hy-brid model for word segmentation and POS tagging using an HMM-based approach. Word information isused to process known-words, and character infor-mation is used for unknown words ... word segmentation and POS tagging are still performed separately, and exact inferencefor both is possible. However, the interaction be-tween POS and segmentation is restricted by rerank-ing: POS...
  • 9
  • 576
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

Báo cáo khoa học

... that segmentation and POS tagging taskis to divide a character sequence into several subse-quences and label each of them a POS tag.It is a better idea to perform segmentation and POS tagging ... Whenwe derive a candidate result from a word- POS pairp and a candidate q at prior position of p, we cal-culate the scores of the word LM, the POS LM, thelabelling probability and the generating ... each word- POS pair p (of length l) to thetail of each candidate result at the prior position of p(position i −l), and select for position i a N-best listof candidate results from all these candidates....
  • 8
  • 445
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

Báo cáo khoa học

... model, joint word segmen-tation and POS tagging is decomposed into twosteps: (1) coarse-grained word segmentation and tagging, and (2) fine-grained sub -word tagging. Theworkflow is shown in ... inter-mediate sub -word structure for joint segmentation and tagging. Since the sub-words are large enoughin practice, the decoding for POS tagging over sub-words is efficient. Finally, the Chinese language ... in previouswork (Zhang and Clark, 2010; Jiang et al., 2008b).In this paper, we present an effective and effi-cient solution for joint Chinese word segmentation and POS tagging. Our work is motivated...
  • 10
  • 412
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data" pdf

Báo cáo khoa học

... ~') > mi(;~?: t~), and mY(~." ~) > mY(/~: f/:), however, "~J~:~""7~: ~'"'~}~:~'"'~: ~"should be separated and "~: ~'"'~:~'"'~: ... Abstract Chinese word segmentation is the first step in any Chinese NLP system. This paper presents a new algorithm for segmenting Chinese texts without making use of any lexicon and hand-crafted ... Chinese word segmentation is therefore the first step for any Chinese information processing system[ 1]. Almost all methods for Chinese word segmentation developed so far, both statistical and...
  • 7
  • 396
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Exploring Deterministic Constraints: From a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation" ppt

Báo cáo khoa học

... decoding.3 Chinese Word Segmentation (CWS)3.1 Word segmentation as character tagging Considering the ambiguity problem that a Chinese character may appear in any relative position in a word and the ... beginning of a word and Iall other positions; and 2) BMES: where B, M and Erepresent the beginning, middle and end of a multi-character word respectively, and S tags a single-character word. For ... Character- and word- based features of a possi-ble word wiover the input character sequence c. Supposethat wi= ci0ci1ci2, and its preceding and following char-acters are cl and crrespectively.parameter...
  • 9
  • 425
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification" pdf

Báo cáo khoa học

... Processing, pp. 147-173.Gao, J. and A. Wu and Mu Li and C N.Huang and H. Li and X. Xia and H. Qin. 2004. Adaptive Chinese Word Segmentation. In Proceedings of ACL-2004.Meng, H. and C. W. Ip. 1999. An ... N. 2003. Chinese Word Segmentation as Charac-ter Tagging. Computational Linguistics and Chinese Language Processing. 8(1): 29-48Redington, M. and N. Chater and C. Huang and L. Chang and K. Chen. ... that Chinese word segmentation is the classifi-cation of a string of character-boundaries(CB’s) into either word- boundaries (WB’s) and non -word- boundaries. In Chinese, CB’sare delimited and...
  • 4
  • 301
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation" doc

Báo cáo khoa học

... to in-tegrate Chinese word segmentation, part-of-speech tagging and parsing (Wu and Zixin, 1998; Zhou and Su, 2003; Luo, 2003; Fung et al., 2004). However,in these research all words were considered ... Computational Linguis-tics.Wenbin Jiang, Liang Huang, and Qun Liu. 2009. Au-tomatic adaptation of annotation standards: Chinese word segmentation and POS tagging – a case study. InProceedings of the ... Jun’ichiKazama, Yiou Wang, Kentaro Torisawa, and HitoshiIsahara. 2009. An error-driven word- character hybridmodel for joint Chinese word segmentation and POS tagging. In Proceedings of the Joint Conference...
  • 10
  • 476
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese" potx

Báo cáo khoa học

... sequence of POS tags. The joint approach to word segmentation and POS tagging has been reported to improve word seg-mentation and POS tagging accuracies by more than1% in Chinese (Zhang and Clark, ... q−1 and q−2respectively denote the last-shifted word and the word shifted before q−1. q.w and q.t respectively denote the(root) word form and POS tag of a subtree (word) q, and q.b and q.e ... bound-ary of the top word on the stack if the last action was A or SH(t).1048interaction between segmentation and POS tagging. 3 Model3.1 Incremental Joint Segmentation, POS Tagging, and Dependency...
  • 9
  • 523
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Unsupervized Word Segmentation: the case for Mandarin Chinese" doc

Báo cáo khoa học

... systems based on Harris's hypothesis(see (Magistry and Sagot, 2011) and Jin (2007) for alonger discussion). Many errors are related to dates and Chinese numbers. This could and should bedealt ... len(wi),where W is the segmentation corresponding to thesequence of words w0w1. . . wm, and len(wi) is thelength of a word wiused here to be able to com-pare segmentations resulting ... 0.59–0.79. In a segmented Chinese text,most of the tokens are uni- and bigrams but most ofthe types are bi- and trigrams (as unigrams are oftenhigh frequency grammatical words and trigrams theresult...
  • 5
  • 467
  • 1
Báo cáo khoa học:

Báo cáo khoa học: "SVD and Clustering for Unsupervised POS Tagging" docx

Báo cáo khoa học

... counts all the cluster-j (j=1… k1) words to the right of word i, and Lij counts all the cluster-j words to the left of word i. The new ma-trices L and R have dimension Ntypes × k1. ... three evaluation criteria of Gao and Johnson (2008): M-to-1, 1-to-1, and VI. M-to-1 and 1-to-1 are the tagging accuracies under the best many-to-one map and the greedy one-to-one map re-spectively; ... Tagging accuracy under the best M-to-1 map, the greedy 1-to-1 map, and VI, for the full PTB45 tagset and the reduced PTB17 tagset. HMM-EM, HMM-VB and HMM-GS show the best results from Gao and...
  • 5
  • 269
  • 0

Xem thêm