Sciweavers

TAL
2010
Springer

Summarization as Feature Selection for Document Categorization on Small Datasets

13 years 2 months ago
Summarization as Feature Selection for Document Categorization on Small Datasets
Abstract. Most common feature selection techniques for document categorization are supervised and require lots of training data in order to accurately capture the descriptive and discriminative information from the defined categories. Considering that training sets are extremely small in many classification tasks, in this paper we explore the use of unsupervised extractive summarization as a feature selection technique for document categorization. Our experiments using training sets of different sizes indicate that text summarization is a competitive approach for feature selection, and show its appropriateness for situations having small training sets, where it could clearly outperform the traditional information gain technique.
Emmanuel Anguiano-Hernández, Luis Villase&n
Added 30 Jan 2011
Updated 30 Jan 2011
Type Journal
Year 2010
Where TAL
Authors Emmanuel Anguiano-Hernández, Luis Villaseñor Pineda, Manuel Montes-y-Gómez, Paolo Rosso
Comments (0)