This paper presents an efficient compression-oriented segmentation algorithm for computer-generated document images. In this algorithm, a document image is represented in a block-...
Language models used in current automatic speech recognition systems are trained on general-purpose corpora and are therefore not relevant to transcribe spoken documents dealing w...
Visual paradigms such as node-link diagrams are well suited to the representation of Semantic Web data encoded with the Resource Description Framework (RDF), whose data model can ...
Web spam can significantly deteriorate the quality of search engines. Early web spamming techniques mainly manipulate page content. Since linkage information is widely used in we...
In automated text categorization, given a small number of labeled documents, it is very challenging, if not impossible, to build a reliable classifier that is able to achieve high...
Zenglin Xu, Rong Jin, Kaizhu Huang, Michael R. Lyu...