Broder et al.’s [3] shingling algorithm and Charikar’s [4] random projection based approach are considered “state-of-theart” algorithms for finding near-duplicate web pag...
In this paper, we aim at showing the advantages of Conceptual Graph formalism for the Semantic Web through several real-world applications in the framework of Corporate Semantic We...
We detail the design of a search engine for archival finding aids based on an XML database system. The resulting system shows results--which can vary in granularity from individual...
One of the main interests in the Web Information Retrieval research area is the identification of the user interests and needs so the search engines and tools can help the users t...
This paper proposes an OCR post-processing approach based on multi-knowledge, which integrates language knowledge and candidate distance information given by the OCR engine. In thi...