The increasingly prominent new subset of Web pages, called `blogs' differs from traditional Web pages both in characteristics and potential to applications. We explore three ...
Understanding the source, data, and documentation files associated with legacy systems in preparation for maintenance or reengineering is an increasingly important problem for man...
The Web is a dynamic, ever changing collection of information. This paper explores changes in Web content by analyzing a crawl of 55,000 Web pages, selected to represent different...
Eytan Adar, Jaime Teevan, Susan T. Dumais, Jonatha...
We present GoGetIt!, a tool for generating structure-driven crawlers that requires a minimum effort from the users. The tool takes as input a sample page and an entry point to a W...
Altigran Soares da Silva, Edleno Silva de Moura, J...
—Modern applications such as web knowledge base, network traffic monitoring and online social networks have made available an unprecedented amount of network data with rich type...