Link spam deliberately manipulates hyperlinks between web pages in order to unduly boost the search engine ranking of one or more target pages. Link based ranking algorithms such ...
We consider the problem of document indexing and representation. Recently, Locality Preserving Indexing (LPI) was proposed for learning a compact document subspace. Different from...
In this paper, we propose an autonomous learning scheme to automatically build visual semantic concept models from the output data of Internet search engines without any manual la...
As massive document repositories and knowledge management systems continue to expand, in proprietary environments as well as on the Web, the need for duplicate detection becomes i...
nt formal speci cations of a new abstraction, weak sets, which can be used to alleviate high latencies when retrieving data from a wide-area information system like the World Wide...