0

joint chinese word segmentation

Báo cáo khoa học:

Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

Báo cáo khoa học

... obtain accuracyimprovements on both segmentation and Joint S&T.2 Segmentation and POS TaggingGiven a Chinese character sequence:C1:n= C1C2 Cnthe segmentation result can be depicted ... seg-mentation only and joint segmentation andpart-of-speech tagging. On the Penn Chinese Treebank 5.0, we obtain an error reduction of18.5% on segmentation and 12% on joint seg-mentation and ... performing POS tag-ging following segmentation; or joint segmentation and POS tagging (Joint S&T). Since the typical ap-proach of discriminative models treats segmentation as a labelling problem...
  • 8
  • 445
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

Báo cáo khoa học

... solution for joint Chinese word segmentation and POS tagging. Our work is motivated by severalcharacteristics of this problem. First of all, a major-ity of words are easy to identify in the segmentation problem. ... inter-mediate sub -word structure for joint segmentation and tagging. Since the sub-words are large enoughin practice, the decoding for POS tagging over sub-words is efficient. Finally, the Chinese language ... data for sub -word tagging.3 Method3.1 ArchitectureIn our stacked sub -word model, joint word segmen-tation and POS tagging is decomposed into twosteps: (1) coarse-grained word segmentation...
  • 10
  • 412
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification" pdf

Báo cáo khoa học

... International Chinese Word Segmentation Bake-off. Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan, July2003.Xue, N. 2003. Chinese Word Segmentation ... co-occurrence. Word based model. In this model, statistical dataabout word boundary frequencies for each characteris retrieved word- wise. For example, in the case ofa monosyllabic word only two word ... introduce is that Chinese word segmentation is the classifi-cation of a string of character-boundaries(CB’s) into either word- boundaries (WB’s)and non -word- boundaries. In Chinese, CB’sare delimited...
  • 4
  • 301
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data" pdf

Báo cáo khoa học

... as that in English. Chinese word segmentation is therefore the first step for any Chinese information processing system[ 1]. Almost all methods for Chinese word segmentation developed so far, ... Automatic Word Segmentation System for Written Chinese Texts", Journal of Chinese Information Processing, Vol. 1, No.2, 1987 (in Chinese) [2] Fan C.K.,Tsai WH., "Automatic Word Identification ... ofHong Kong, Hong Kong Abstract Chinese word segmentation is the first step in any Chinese NLP system. This paper presents a new algorithm for segmenting Chinese texts without making use...
  • 7
  • 396
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Exploring Deterministic Constraints: From a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation" ppt

Báo cáo khoa học

... decoding.3 Chinese Word Segmentation (CWS)3.1 Word segmentation as character taggingConsidering the ambiguity problem that a Chinese character may appear in any relative position in a word and ... Character- and word- based featuresAs studied in previous work, word- based featuretemplates usually include the word itself, sub-wordscontained in the word, contextual characters/wordsand so ... are incorporatedinto word- based CWS models, some word- basedfeatures are no longer of interest, such as the start-ing character of a word, sub-words contained inthe word, contextual characters...
  • 9
  • 425
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation" doc

Báo cáo khoa học

... error-driven word- character hybridmodel for joint Chinese word segmentation and POStagging. In Proceedings of the Joint Conference of the47th Annual Meeting of the ACL and the 4th Interna-tional Joint ... Generation of Words with InternalStructuresWords with rich internal structures can be describedusing a context-free grammar formalism as word → root (3) word → word suffix (4) word → prefix word (5)Here ... trained with the Penn Chinese Treebank and actually is able to parse both word and phrase structures in a unified way.1 Why Parse Word Structures?Research in Chinese word segmentation has pro-gressed...
  • 10
  • 476
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese" potx

Báo cáo khoa học

... 1996. A stochastic finite-state word- segmentation algorithm for Chinese. ComputationalLinguistics, 22.Weiwei Sun. 2011. A stacked sub -word model for joint Chinese word segmentation and part-of-speech ... error-driven word- character hybridmodel for joint Chinese word segmentation and POStagging. In Proceedings of the Joint Conference of the47th Annual Meeting of the ACL and the 4th Interna-tional Joint ... sequence of POStags. The joint approach to word segmentation andPOS tagging has been reported to improve word seg-mentation and POS tagging accuracies by more than1% in Chinese (Zhang and Clark,...
  • 9
  • 523
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Unsupervized Word Segmentation: the case for Mandarin Chinese" doc

Báo cáo khoa học

... len(wi),where W is the segmentation corresponding to thesequence of words w0w1. . . wm, and len(wi) is thelength of a word wiused here to be able to com-pare segmentations resulting ... redefine the sentence segmentation problem as the maximization of the au-tonomy measure of its words. For a character se-quence s, if we call Seg(s) the set of all the possiblesegmentations, then ... againstthe corpora from the Second International Chi-nese Word Segmentation Bakeoff (Emerson, 2005).These corpora cover 4 different segmentation guide-lines from various origins: Academia Sinica...
  • 5
  • 467
  • 1
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Joint Word Segmentation and POS Tagging using a Single Perceptron" docx

Báo cáo khoa học

... specific to Chinese, are shown in Table 2.The word segmentation features are extractedfrom word bigrams, capturing word, word lengthand character information in the context. The word length ... in-formation to improve word segmentation. For ex-ample, the POS -word pattern “number word + “(a common measure word) ” can help in segmentingthe character sequence “ ” into the word se-quence ... isused by the joint model. However, because word segmentation and POS tagging are performed simul-taneously, POS information participates in word seg-mentation.3.1 Formulation of the joint modelWe...
  • 9
  • 576
  • 0
Tài liệu Word Segmentation for Vietnamese Text Categorization: An online corpus approach pptx

Tài liệu Word Segmentation for Vietnamese Text Categorization: An online corpus approach pptx

Cao đẳng - Đại học

... Vietnamese word segmentation is very problematic, especially without a manual segmentation test corpus. Therefore, we perform two experiments, one is done by human judgment for word segmentation ... ways of segmentation, i.e. the important words are segmented correctly while less important words may be segmented incorrectly. Table 6 represents the human judgment for our word segmentation ... inhomogeneous phenomenon in judgment word segmentation. However, the acceptable segmentation percentage is satisfactory. Nearly eighty percent of word segmentation outcome does not make the...
  • 6
  • 741
  • 1
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "An Equivalent Pseudoword Solution to Chinese Word Sense Disambiguation" ppt

Báo cáo khoa học

... monosemous word is usually synonymous to some polysemous words. For example the words "信守, 严守, 恪守遵照 遵从 遵循, , , , 遵守" has similar meaning as one of the senses of the ambiguous word ... in Chinese, which can be used as a knowledge source for WSD. 3.1 Definition of Equivalent Pseudoword If the ambiguous words in the corpus are re-placed with its synonymous monosemous word, ... ambiguous word need to simulate the function of the real ambiguous word, and to acquire semantic knowledge as the real ambiguous word does. Thus, we call it an equivalent pseudoword (EP)...
  • 8
  • 414
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Bilingually Motivated Domain-Adapted Word Segmentation for Statistical Machine Translation" pptx

Báo cáo khoa học

... of the Chinese sideof the training data, including the total vocabulary(Voc), number of character vocabulary (Char.voc)in Voc, and the running words (Run.words) whendifferent word segmentations ... iterations).4 Word Lattice Decoding4.1 Word LatticesIn the decoding stage, the various segmentation alternatives can be encoded into a compact rep-resentation of word lattices. A word lattice ... Given a Chinese sentencecJ1consisting of J characters {c1, . . . , cJ} andan English sentence eI1consisting of I words{e1, . . . , eI}, AC→Ewill denote a Chinese- to-English word...
  • 9
  • 236
  • 0

Xem thêm