This paper presents an architectural design and evaluation result of an efficient Web-crawling system. The design involves a fully distributed architecture, a URL allocating algor...
Today, large-scale web services run on complex systems, spanning multiple data centers and content distribution networks, with performance depending on diverse factors in end syst...
Zhichun Li, Ming Zhang, Zhaosheng Zhu, Yan Chen, A...
The paper investigates techniques for extracting data from HTML sites through the use of automatically generated wrappers. To automate the wrapper generation and the data extracti...
Valter Crescenzi, Giansalvatore Mecca, Paolo Meria...
In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the...
Soon many people will retrieve information from the Web using handheld, palmsized or even smaller computers. Although these computers have dramatically increased in sophistication...
Matt Jones, Gary Marsden, Norliza Mohd-Nasir, Kevi...