Sciweavers

KDD
2008
ACM

Scaling up text classification for large file systems

14 years 4 months ago
Scaling up text classification for large file systems
: We combine the speed and scalability of information retrieval with the generally superior classification accuracy offered by machine learning, yielding a two-phase text classifier that can scale to very large document corpora. We investigate the effect of different methods of formulating the query from the training set, as well as varying the query size. In empirical tests on the Reuters RCV1 corpus of 806,000 documents, we find runtime was easily reduced by a factor of 27x, with a somewhat surprising gain in F-measure compared with traditional text classification. External Posting Date: June 21, 2008 [Fulltext] Approved for External Publication Internal Posting Date: June 21, 2008 [Fulltext] To be presented and published in the 14th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'08), August 2008 ? Copyright 2008 the 14th ACM SIGKDD International Conference
George Forman, Shyamsundar Rajaram
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2008
Where KDD
Authors George Forman, Shyamsundar Rajaram
Comments (0)