Abstract. This paper proposes a two-step method for Chinese text categorization (TC). In the first step, a Naïve Bayesian classifier is used to fix the fuzzy area between two cate...
Abstract. Previous researches on advanced representations for document retrieval have shown that statistical state-of-the-art models are not improved by a variety of different ling...
We present the problem of categorizing web services according to a shallow ontology for presentation on a specialist portal, using their WSDL and associated textual documents foun...
Taxonomies of the Web typically have hundreds of thousands of categories and skewed category distribution over documents. It is not clear whether existing text classification tech...
Tie-Yan Liu, Yiming Yang, Hao Wan, Qian Zhou, Bin ...
In this paper we study the effectiveness of using a phrase-based representation in e-mail classification, and the affect this approach has on a number of machine learning algorithm...