Most previous work on the recently developed languagemodeling approach to information retrieval focuses on document-specific characteristics, and therefore does not take into acc...
We describe a system for the retrieval on the basis of layout similarity of document images belonging to collections stored in digital libraries. Layout regions are extracted and ...
Word-based Huffman coding has widespread use in information retrieval systems. Besides its compressing power, it also enables the implementation of both indexing and searching sch...
We consider the problem of efficiently computing weighted proximity best-joins over multiple lists, with applications in information retrieval and extraction. We are given a multi-...
AnHai Doan, Haixun Wang, Hao He, Jun Yang 0001, Ri...
The World Wide Web is growing at such a pace that even the biggest centralized search engines are able to index only a small part of the available documents on the Internet. The d...