Page Digest for Large-Scale Web Services

12 years 1 months ago
Page Digest for Large-Scale Web Services
The rapid growth of the World Wide Web and the Internet has fueled interest in Web services and the Semantic Web, which are quickly becoming important parts of modern electronic commerce systems. An interesting segment of the Web services domain are the facilities for document manipulation including Web search, information monitoring, data extraction, and page comparison. These services are built on common functional components that can preprocess large numbers of Web pages, parsing them into internal storage and processing formats. If a Web service is to operate on the scale of the Web, it must handle this storage and processing efficiently. In this paper, we introduce Page Digest, a mechanism for efficient storage and processing of Web documents. The Page Digest design encourages a clean separation of the structural elements of Web documents from their content. Its encoding transformation produces many of the advantages of traditional string digest schemes yet remains invertible w...
Daniel Rocco, David Buttler, Ling Liu
Added 05 Jul 2010
Updated 05 Jul 2010
Type Conference
Year 2003
Authors Daniel Rocco, David Buttler, Ling Liu
Comments (0)