Vector Space Model (VSM) has been at the core of information retrieval for the past decades. VSM considers the documents as vectors in high dimensional space. In such a vector spa...
We present a query-driven algorithm for the distributed indexing of large document collections within structured P2P networks. To cope with bandwidth consumption that has been ide...
Gleb Skobeltsyn, Toan Luu, Ivana Podnar Zarko, Mar...
Pseudo-relevance feedback has proven to be an effective strategy for improving retrieval accuracy in all retrieval models. However the performance of existing pseudo feedback meth...
Document clustering has long been an important problem in information retrieval. In this paper, we present a new clustering algorithm ASI1, which uses explicitly modeling of the s...
This work aims to provide a page segmentation algorithm which uses both visual and content information to extract the semantic structure of a web page. The visual information is u...