We address the problem of measuring global quality metrics of search engines, like corpus size, index freshness, and density of duplicates in the corpus. The recently proposed est...
In this paper, we propose a new similarity measure to compute the pairwise similarity of text-based documents based on suffix tree document model. By applying the new suffix tree ...
The success of the Semantic Web depends on the availability of Web pages annotated with metadata. Free form metadata or tags, as used in social bookmarking and folksonomies, have ...
Figures in digital documents contain important information. Current digital libraries do not summarize and index information available within figures for document retrieval. We pr...
Xiaonan Lu, James Ze Wang, Prasenjit Mitra, C. Lee...
As part of a large effort to acquire large repositories of facts from unstructured text on the Web, a seed-based framework for textual information extraction allows for weakly sup...