Sciweavers

DOCENG
2006
ACM

NEWPAR: an automatic feature selection and weighting schema for category ranking

13 years 10 months ago
NEWPAR: an automatic feature selection and weighting schema for category ranking
Category ranking provides a way to classify plain text documents into a pre-determined set of categories. This work proposes to have a look at typical document collections and analyze which measures and peculiarities can help us to represent documents so that the resulting features are as much discriminative and representative as possible. Considerations such as selecting only nouns and adjectives, taking expressions rather than words, and using measures like term length, are combined into a simple feature selection and weighting method to extract, select and weight especial ngrams. Several experiments are performed to prove the usefulness of the new schema with different data sets (Reuters and OHSUMED) and two different algorithms (SVM and a simple sum of weights). After evaluation, the new approach outperforms some of the best known and most widely used categorization methods. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexi...
Fernando Ruiz-Rico, José Luis Vicedo Gonz&a
Added 13 Jun 2010
Updated 13 Jun 2010
Type Conference
Year 2006
Where DOCENG
Authors Fernando Ruiz-Rico, José Luis Vicedo González, María-Consuelo Rubio-Sánchez
Comments (0)