Sciweavers

HIS
2001

Linear Discriminant Text Classification in High Dimension

13 years 6 months ago
Linear Discriminant Text Classification in High Dimension
Abstract. Linear Discriminant (LD) techniques are typically used in pattern recognition tasks when there are many (n >> 104 ) datapoints in low-dimensional (d < 102 ) space. In this paper we argue on theoretical grounds that LD is in fact more appropriate when training data is sparse, and the dimension of the space is extremely high. To support this conclusion we present experimental results on a medical text classification problem of great practical importance, autocoding of adverse event reports. We trained and tested LD-based systems for a variety of classification schemes widely used in the clinical drug trial process (COSTART, WHOART, HARTS, and MedDRA) and obtained significant reduction in the rate of misclassification compared both to generic Bayesian machine-learning techniques and to the current generation of domain-specific autocoders based on string matching.
András Kornai, J. Michael Richards
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2001
Where HIS
Authors András Kornai, J. Michael Richards
Comments (0)