In this paper, we report on our experience with the creation of an automated, human-assisted process to extract metadata from documents in a large (>100,000), dynamically growi...
Jianfeng Tang, Kurt Maly, Steven J. Zeil, Mohammad...
: Hypertext categorization is the automatic classification of web documents into predefined classes. It poses new challenges for automatic categorization because of the rich inform...
Abstract - We discuss an ensemble-of-classifiers based algorithm for the missing feature problem. The proposed approach is inspired in part by the random subspace method, and in pa...
Category ranking provides a way to classify plain text documents into a pre-determined set of categories. This work proposes to have a look at typical document collections and ana...
This paper describes a new method for the classification of a HTML document into a hierarchy of categories. The hierarchy of categories is involved in all phases of automated docum...