Recent work on incremental crawling has enabled the indexed document collection of a search engine to be more synchronized with the changing World Wide Web. However, this synchron...
Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey...
WAIF is a new framework to facilitate easy user access for Internet users to relevant news items. WAIF supports new kinds of browsers, personalized filters, recommendation systems...
Dag Johansen, Robbert van Renesse, Fred B. Schneid...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
The rise of email and instant messaging as important tools in the professional workplace has created changes in how we communicate. One such change is that these media tend to red...
Recent years have witnessed an explosion in the availability of news articles on the World Wide Web. Although searchengines’ algorithms have made it easier to locate these docum...