Sciweavers

DL
1994
Springer

Corpus Linguistics for Establishing The Natural Language Content of Digital Library Documents

13 years 8 months ago
Corpus Linguistics for Establishing The Natural Language Content of Digital Library Documents
Digital Libraries will hold huge amounts of text and other forms of information. For the collections to be maximally useful, they must be highly organized with useful indexes and intraand inter-document linkages. This brings with it a demand for ever-better methods for automated analysis of text to build the indexes and links. It requires turning implicit information, "encrypted in natural language" into explicit information. We discuss approaches to the automation task built on the techniques of corpus linguistics. This paper focuses on word classification as an example of the utility of corpus methods. Results are presented for the syntactic and semantic classification of words from a biological corpus. The word classes identified can then be used for indexing, query expansion, syntactic analysis and for linking separate library collections by aligning word senses. The paper also discusses derivative objects, diagram analysis and authoring tools. Finally, we outline a new ...
Robert P. Futrelle, Xiaolan Zhang 0002, Yumiko Sek
Added 09 Aug 2010
Updated 09 Aug 2010
Type Conference
Year 1994
Where DL
Authors Robert P. Futrelle, Xiaolan Zhang 0002, Yumiko Sekiya
Comments (0)