Scalable association-based text classification

9 years 6 months ago
Scalable association-based text classification
Naïve Bayes (NB) classifier has long been considered a core methodology in text classification mainly due to its simplicity and computational efficiency. There is an increasing need however for methods that can achieve higher classification accuracy while maintaining the ability to process large document collections. In this paper we examine text categorization methods from a perspective that considers the tradeoff between accuracy and scalability to large data sets and large feature sizes. We start from the observation that Support Vector Machines, one of the best text categorization methods cannot scale up to handle the large document collections involved in many real word problems. We then consider bayesian extensions to NB that achieve higher accuracy by relaxing its strong independence assumptions. Our experimental results show that LB, an association-based lazy classifier can achieve a good tradeoff between high classification accuracy and scalability to large document collecti...
Dimitris Meretakis, Dimitris Fragoudis, Hongjun Lu
Added 02 Aug 2010
Updated 02 Aug 2010
Type Conference
Year 2000
Where CIKM
Authors Dimitris Meretakis, Dimitris Fragoudis, Hongjun Lu, Spiros Likothanassis
Comments (0)