In this paper, we present the results of our work that seek to negotiate the gap between low-level features and high-level concepts in the domain of web document retrieval. This wo...
This paper explores the problem of computing pairwise similarity on document collections, focusing on the application of “more like this” queries in the life sciences domain. ...
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
In this paper, we discuss how the focus in document analysis, generally speaking, and in graphics recognition more specifically, has moved from re-engineering problems to indexin...
In this paper, we propose the ontology-based semantic web search model to enhance efficiency and accuracy of information retrieval for unstructured and semi-structured documents. N...