Sciweavers

ANLP
1994

Exploiting Sophisticated Representations for Document Retrieval

13 years 6 months ago
Exploiting Sophisticated Representations for Document Retrieval
The use of NLP techniques for document classification has not produced significant improvements in performance within the standard term weighting statistical assignment paradigm (Fagan 1987; Lewis, 1992bc; Buckley, 1993). This perplexing fact needs both an explanation and a solution if the power of recently developed NLP techniques are to be successfully applied in IR. A novel method for adding linguistic annotation to corpora is presented which involves using a statistical POS tagger in conjunction with unsupervised structure finding methods to derive notions of "noun group", "verb group", and so on which is inherently extensible to more sophisticated annotation, and does not require a pre-tagged corpus to fit. One of the distinguishing features of a more linguistically sophisticated representation of documents over a word set based representation of them is that linguistically sophisticated units are more frequently individuallygood predictors of document descrip...
Steven Finch
Added 02 Nov 2010
Updated 02 Nov 2010
Type Conference
Year 1994
Where ANLP
Authors Steven Finch
Comments (0)