Sciweavers

HT
2006
ACM

Just-in-time recovery of missing web pages

13 years 10 months ago
Just-in-time recovery of missing web pages
We present Opal, a light-weight framework for interactively locating missing web pages (http status code 404). Opal is an example of “in vivo” preservation: harnessing the collective behavior of web archives, commercial search engines, and research projects for the purpose of preservation. Opal servers learn from their experiences and are able to share their knowledge with other Opal servers by mutual harvesting using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). Using cached copies that can be found on the web, Opal creates lexical signatures which are then used to search for similar versions of the web page. We present the architecture of the Opal framework, discuss a reference implementation of the framework, and present a quantitative analysis of the framework that indicates that Opal could be effectively deployed. Categories and Subject Descriptors H.3.7 [Digital Libraries]: System Issues General Terms Algorithms, Design, Experimentation, Human Fac...
Terry L. Harrison, Michael L. Nelson
Added 13 Jun 2010
Updated 13 Jun 2010
Type Conference
Year 2006
Where HT
Authors Terry L. Harrison, Michael L. Nelson
Comments (0)