This paper examines several different approaches to exploiting structural information in semi-structured document categorization. The methods under consideration are designed for ...
A machine-learning and a string-matching approach to automated subject classification of text were compared, as to their performance, advantages and downsides. The former approach ...
Document classification presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of the natural language. The tradi...
This paper extends previous work on document retrieval and document type classification, addressing the problem of ‘typed search’. Specifically, given a query and a designated ...
Jun Xu, Yunbo Cao, Hang Li, Nick Craswell, Yalou H...
With the rapid emergence and proliferation of Internet and the trend of globalization, a tremendous amount of textual documents written in different languages are electronically ac...