Defining the boundaries of a web-site, for (say) archiving or information retrieval purposes, is an important but complicated task. In this paper a web-page clustering approach to...
Many web documents (such as JAVA FAQs) are being replicated on the Internet. Often entire document collections (such as hyperlinked Linux manuals) are being replicated many times....
Abstract. We consider a collaboration of peers autonomously crawling the Web. A pivotal issue when designing a peer-to-peer (P2P) Web search engine in this environment is query rou...
Sebastian Michel, Matthias Bender, Peter Triantafi...
In the past few years there has been increased research interest in detecting previously unidentified events from Web resources. Our focus in this paper is to detect events from ...
Background: Single-pass, partial sequencing of complementary DNA (cDNA) libraries generates thousands of chromatograms that are processed into high quality expressed sequence tags...
Charu G. Kumar, Richard LeDuc, George Gong, Levan ...