... Computational Linguistics Unsupervised Discoveryof Domain-Specific Knowledgefrom Text Dirk Hovy, Chunliang Zhang, Eduard HovyInformation Sciences InstituteUniversity of Southern California4676 Admiralty ... Research with a Series of Reading Tasks. In Proceedings of LREC 2010.Fabian M. Suchanek, Gjergji Kasneci, and GerhardWeikum. 2007. Yago: a core of semantic knowledge. In Proceedings of the 16th international ... 2006. Unsupervised Learning of Verb Argument Structures.Computational Linguistics and Intelligent Text Pro-cessing, pages 59–70.Anselmo Pe˜nas and Eduard Hovy. 2010. Semantic en-richment of text...
... characterization of HpSDH demonstrates its activity with kcat of 7.7 s)1and Km of 0.148 mmtoward shikimate, kcat of 7.1 s)1and Km of 0.182 mm toward NADP, kcat of 5.2 s)1and Km of 2.9 mm ... a kcat of 7.7 ± 0.9 s)1, Km of 0.148 ± 0.028 mm andkcat⁄ Km of 5.2 · 104m)1Æs)1toward shikimate, and akcat of 7.1 ± 0.7 s)1, Km of 0.182 ± 0.027 mm andkcat⁄ Km of 3.9 · ... Shen1,21 Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China2 School of Pharmacy, East...
... is an edited ver-sion of the public-domain portion of the corpus usedby Sonderegger (2011), and consists of just under12000 stanzas spanning a range of poets and dates from the 15thto 20thcenturies. ... 2011.c2011 Association for Computational Linguistics Unsupervised Discoveryof Rhyme SchemesSravana ReddyDepartment of Computer ScienceThe University of ChicagoChicago, IL 60637sravana@cs.uchicago.eduKevin ... extremely useful forlarge-scale statistical analyses of poetic texts.• Historical Linguistics/Study of DialectsRhymes of a word in poetry of a given timeperiod or dialect region provide clues...
... prob-lem of automatic word sense induction. Proceedings of ACL (Companion Volume), Barcelona, 195-198. Schütze, Hinrich (1993). Part -of- speech induction from scratch. Proceedings of ACL, Columbus, ... clustering of global vectors is more adequate (see footnote 1). This finding is of interest when trying to understand the nature of syntax versus semantics if expressed in statistical terms. Acknowledgements ... assignment of the ambiguous words to clusters is not required at this stage, as this is taken care of in the next step. This step involves computing the differential vector of each word from the...
... is the left part of word, RP is the right part of it, Len (p) is the length of part P (number of characters), freq(p) is the frequency of part P in corpus, WN is the number of words (corpus ... definition of the word as a unit has been agreed upon. If effective methods can be devised for the unsuperviseddiscoveryof morphemes, they could aid the formulation of a linguistic theory of morphology ... results of a research on unsupervised Persian mor-pheme discovery. In this paper we pre-sent a method for discovering the mor-phemes of Persian language through automatic analysis of corpora....
... EMNLP ’06.Hearst, M., 1992. Automatic acquisition of hyponyms from large text corpora. COLING ’92Lin, D., Pantel, P., 2002. Concept discoveryfrom text. COLING 02.Moldovan, D., Badulescu, A., Tatu, ... unsuper-vised discoveryof word categories using symmetricpatterns and high frequency words. COLING-ACL’06.Davidov, D., Rappoport, A. and Koppel, M., 2007. Fully unsupervised discoveryof concept-specific ... cor-pus, the set of the contexts in which the word ap-pears. Each context is a window containing Wwords or punctuation characters before and after thehook word. We avoid extracting textfrom clearlyunformatted...
... Mining From the technique point of view, Web usage mining is the application of data mining techniques to usage logs (secondary Web data) of large Web data repositories. The purpose of it is ... content mining focuses on the discovery/ retrieval of the useful information from the Web contents/data/documents, while the Web structure mining emphasizes to the discoveryof how to model the underlying ... patterns and to extract the interesting rules or patterns from the output of the pattern discovery process. The output of Web mining algorithms is often not in the form suitable for direct human consumption,...
... we suggest to look at local contexts instead of global co-occurrence vec-tors. As can be seen from human performance, in almost all cases the local context of a syntactically ambiguous word ... part of speech. The core assumption underlying our approach, which in the context of cognition and child lan-guage has been proposed by Mintz (2003), is that words of a particular part of speech ... the part -of- speech distribution of syntacti-cally ambiguous words without explicitly tagging the underlying text corpus. This is achieved by assuming that the word pair consisting of the left...
... about the text) . Accuracy in identifying SF occurrences • Simplicity of design and speed Efficient use of the available text was not a high priority, since it was felt that plenty oftext was ... nique based on the Case Filter of Rouvret and Vergnaud (1980). The completeness of the output list increases monotonically with the total number of occurrences of each verb in the corpus. False ... verbs occur in a sen- tence. Finding some of the verbs in a text reliably is hard enough; finding all of them reliably is well beyond the scope of this work. Finally, any system applied...
... context". The four syntactic links of LEXTER Can be used to define this terminological context. For in- stance, the "expansion terminological context" (E- terminological context) ... the object LINE. This definition of the context is original compared to the classical context definitions used in Informa- tion Retrieval, where the context of a lexical unit is obtained by ... description of the concept "line" 5 Discussion • Evaluation of the quality of the clustering pro- cedure • in the majority of the works using clus- tering methods, the evaluation of the...
... tostabilize the dimeric form of EUGO. The partially apoform of EUGO became fully flavinylated in vitro bythe addition of FAD. The cofactor incorporationresulted in formation of holo dimeric EUGO, ... struc-ture (black) and the modeled apo-EUGO structure (gray). His422 of VAO, linking the FAD cofactor, aligns with His390 of EUGO. Discovery of a eugenol oxidase J. Jin et al.2314 FEBS Journal 274 (2007) ... were purchased from Invitrogen (Carlsbad, CA, USA).Expression and purification of recombinant EUGODNA from Rhodococcus sp. strain RHA1 was a kind gift from R v.d. Geize (University of Groningen,...
... up of multiple words, rather than just using the head nouns of the noun phrases. 124 Automatic construction of a hypernym-labeled noun hierarchy from text Sharon A. Caraballo Dept. of Computer ... cluster of cities that because of sparse data was assigned a poor hypernym. Some of the suggestions in the .following sec- tion might correct this problem. Of the 50 noise words, a few of them ... two groups of nouns, we define similarity as the average of the cosines between each pair of nouns made up of one noun from each of the two groups. sim(A,B) = Ev,wCOS (v,w) size(A)size(B)...
... Unfortunately one of the cur- rent trends in IE is the progressive reduction of the size of training corpora: e.g., from the 1,000 texts of the MUC-5 (MUC-5, 1993) to the 100 texts in MUC-6 ... Abstract Lexicon definition is one of the main bot- tlenecks in the development of new ap- plications in the field of Information Ex- traction from text. Generic resources (e.g., lexical ... also cover the generic knowledgeof the world. The integration consists in marking parts of WordNet's hierarchy, i.e. some synsets, with semantic labels taken from the DDC. 4 The development...
... similarity of their contexts. Contexts are col-lected as patterns of two kinds: dependency pat-terns from dependency analysis of sentences inWikipedia, and surface patterns generated from highly ... combinations of patterns: dependency patterns from depen-dency analysis of texts in Wikipedia, andsurface patterns generated from highly re-dundant information related to the Web.Evaluations of the ... clustering pairs of co-occurring entities represented as vectors of con- text features. They used a simple representation of contexts; the features were words in sentencesbetween the entities of the...