This paper discusses generating document structure from annotated media repositories in a domain-independent manner. This approaches the vision of a universal RDF browser. We star...
Lloyd Rutledge, Jacco van Ossenbruggen, Lynda Hard...
In this paper we propose a hierarchical clustering engine, called SnakeT, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hier...
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...
The rapid growth of the web has been noted and tracked extensively. Recent studies have however documented the dual phenomenon: web pages have small half lives, and thus the web e...
Ziv Bar-Yossef, Andrei Z. Broder, Ravi Kumar, Andr...
Breaking news often contains timely definitions and descriptions of current terms, organizations and personalities. We utilize such web sources to construct definitions for such t...