A web mashup is a web application that integrates content from different providers to create a new service, not offered by the content providers. As mashups grow in popularity, ...
We describe an HTML web page segmentation algorithm, which is applied to segment online medical journal articles (regular HTML and PDF-Converted-HTML files). The web page content ...
We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
Dereferencing a URI returns a representation of the current state of the resource identified by that URI. But, on the Web representations of prior states of a resource are also av...
Herbert Van de Sompel, Robert Sanderson, Michael L...
An appreciation of the roles of genre and task is important in understanding how people browse the Web. Genre is characterized by content and form and is intimately linked to the ...
Carolyn R. Watters, Michael A. Shepherd, Forbes J....