Tài liệu Báo cáo khoa học: "MACHINE-READABLE COMPONENTS IN A VARIETY OF INFORMATION-SYSTEM APPLICATIONS" potx

1 503 0
Tài liệu Báo cáo khoa học: "MACHINE-READABLE COMPONENTS IN A VARIETY OF INFORMATION-SYSTEM APPLICATIONS" potx

Đang tải... (xem toàn văn)

Thông tin tài liệu

MACHINE-READABLE COMPONENTS IN A VARIETY OF INFORMATION-SYSTEM APPLICATIONS Howard R. Webber Reference Publishing Division Houghton-Mifflln Company 2 Park Street Boston. MA 02108 Components of the machine-readable dictionary can be applied in a number of information systems. The most direct applications of the kind are in wordprocessing or in "writing- support" systems built on a wordprocessing base. However, because a central function of any dictionary is in fact data verification, there are other proposed applications in communications and data storage and retrieval systems. Moreover, the complete interrelational electronic dictionary is in some sense the model of the language; and there are, accordingly, additional implications for language-based information search and retrieval. In regard to wordprocessing, the electronic lexicon can serve as the base for spelling verification (in which the computer detects many spelling or typographical errors} and spelling correction (in which the computer offers corrections to the errors it has identified). Because it is possible to develop algorithms that permit the computer to calculate the chances that the single best alternative it offers is actually correct, this substitution can in many cases be made automatically. It is at this point in the development of such systems wise to flag such automatic corrections for inspection by the operator. At the present time, these processes generally depend upon the application of strict frequency measures, which permit the lexicon to be reduced to small-machine proportions and thereby reduce the possibility of a false hit the passing of a misspelled common word that happens to coincide in orthography with a legitimate but rare word. As our ability to draw cognitive information from text increases, and as available memory increases, then such limits can be abandoned. Truncation of the lexicon for other specific applications can be considered. It is possible, for example, to shape the lexicon to reflect a children's vocabulary and thereby to develop spelling correction and other writing aids for the early educational years on a very small machine base. It is also possible to shape the lexicon to the needs of the educated adult user, for whom information about common words is unnecessary, and thereby to provide an exceptionally rich resource about "difficult" words within small-machine memory for on-line access to spelling, definition, and pronunciation. Configuring the lexicon pyramidally by frequency, including all words of high frequency, seems an inevitable model to us now, but it is of course a kind of historical accident. As many of these comments already make clear, even if one resolves to work within the linguistic bounds of the ordinary print dictionary, there are differences in the demands placed upon the dictionary by print applications and those arising out of electronic applications. It is a matter of judgment or taste for the print lexicographer not to include geographic and biographic terms in the lexicon, but the electronic lexicographer does not have that latitude. Access to on-line dictionaries can be by the standard alphabetic means or by well-developed phonetic algorithms (which solve the conundrum of needing to know spelling before being able to find spelling) or by definition (the reverse dictionary). As electronic citation for words and senses is done on the basis of machine scans of print-composition tapes and even of voice scans, then sensitive subject coding should permit the development of lexicons tailored to the user profile, with attendant benefits in comprehensiveness and economy of memory. One can conceive of dictionaries that monitor their own use and respond by offering only unkown information to the individual user. The dictionary that contains synonymy is a resource in the construction of electronic synonym generators, of which there is at least one model that returns synonyms in the inflections of the source words, including phrasal synonyms, taking precise account of all irregularities in doing so. Presentation of synonyms is useful for "knowledge workers" but not for clerical workers. If usage information is included in the dictionary, then it is deliverable as a discrete electronic product. The most direct key to specific usage guidance is by "trigger" words or phrases that call up guidance information for the operator, but much more sophisticated implementations are possible when programming addresses grammar and syntax. In large-system management, where accuracy of alpha data is a consideration, the machine dictionary can be the base or one of the bases for verification and correction of data streams in communication or of stored data. ~hat I have called the complete interrelational dictionary-fully coded to reflect the range of significant linguistic information-will serve as the base for retrieving information by meaning rather than mechanics. 463 . grammar and syntax. In large-system management, where accuracy of alpha data is a consideration, the machine dictionary can be the base or one of the bases. function of any dictionary is in fact data verification, there are other proposed applications in communications and data storage and retrieval systems.

Ngày đăng: 21/02/2014, 20:20

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan