Sciweavers

AUSDM
2008
Springer

Structure-Based Document Model with Discrete Wavelet Transforms and Its Application to Document Classification

13 years 6 months ago
Structure-Based Document Model with Discrete Wavelet Transforms and Its Application to Document Classification
Term signal is an existing text representation that depicts a term as a vector of frequencies of occurrences in a number of user-defined partitions of a document. Although term signal augments the traditional vector space model with patterns of term occurrences, its document division is not coherent with the actual logical structure of a document. In this paper, we propose a novel document model, termed Structure-Based Document Model with Discrete Wavelet Transforms (SDMDWT), that exploits the structural information of documents and mathematical transforms for document representation. The proposed SDMDWT model enhances the existing term signal concept by additionally taking into consideration document's structural information during document division. We evaluated the proposed model on two different domains of standard data sets, WebKB 4-Universities and TREC Genomics 2005, using Support Vector Machines binary classification. The experimental results show that using our SDMDWT mo...
Supphachai Thaicharoen, Tom Altman, Krzysztof J. C
Added 12 Oct 2010
Updated 12 Oct 2010
Type Conference
Year 2008
Where AUSDM
Authors Supphachai Thaicharoen, Tom Altman, Krzysztof J. Cios
Comments (0)