This paper describes the design of a crawler devised to perform the periodic retrieval of Web documents for a search engine able to accept on-line updates in a concurrent manner. ...
The Service Oriented Architecture (SOA) provides a methodology for designing software systems by integrating loosely coupled services. Compared to traditional distributed object-o...
Ziyang Duan, Subhra Bose, Charles A. Shoniregun, P...
We describe the architecture of a hypertext resource discovery system using a relational database. Such a system can answer questions that combine page contents, metadata, and hyp...
Soumen Chakrabarti, Martin van den Berg, Byron Dom
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
This paper presents a system for enabling offline web use to satisfy the information needs of disconnected communities. We describe the design, implementation, evaluation, and pil...
Jay Chen, Russell Power, Lakshminarayanan Subraman...