On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus ...
We present a static index pruning method, to be used in ad-hoc document retrieval tasks, that follows a documentcentric approach to decide whether a posting for a given term shoul...
This paper follows a word-document co-clustering model independently introduced in 2001 by several authors such as I.S. Dhillon, H. Zha and C. Ding. This model consists in creatin...
Opinion finding is a challenging retrieval task, where it has been shown that it is especially difficult to improve over a strongly performing topic-relevance baseline. In this pa...
Rodrygo L. T. Santos, Ben He, Craig Macdonald, Iad...
This paper describes features and methods for document image comparison and classification at the spatial layout level. The methods are useful for visual similarity based document...
Jianying Hu, Ramanujan S. Kashi, Gordon T. Wilfong