There has been a tremendous growth in the amount of information and resources on the World Wide Web that are useful to researchers and practitioners in science domains. While the ...
Michael Chau, Zan Huang, Jialun Qin, Yilu Zhou, Hs...
One of the most important steps in web crawling is determining the starting points, or seed selection. This paper identifies and explores the problem of seed selection in webscal...
With massive book digitization efforts underway, there is a need for developing effective book retrieval strategies. This paper explores the relative contribution of different par...
Many web documents (such as JAVA FAQs) are being replicated on the Internet. Often entire document collections (such as hyperlinked Linux manuals) are being replicated many times....
Keyword searching while very successful in narrowing down the contents of the Web to the pertaining subset of information, has two primary drawbacks. First, the accuracy of the se...