Sciweavers

SIGMOD
2010
ACM

Optimizing content freshness of relations extracted from the web using keyword search

13 years 4 months ago
Optimizing content freshness of relations extracted from the web using keyword search
An increasing number of applications operate on data obtained from the Web. These applications typically maintain local copies of the web data to avoid network latency in data accesses. As the data on the Web evolves, it is critical that the local copy be kept up-to-date. Data freshness is one of the most important data quality issues, and has been extensively studied for various applications including web crawling. However, web crawling is focused on obtaining as many raw web pages as possible. Our applications, on the other hand, are interested in specific content from specific data sources. Knowing the content or the semantics of the data enables us to differentiate data items based on their importance and volatility, which are key factors that impact the design of the data synchronization strategy. In this work, we formulate the concept of content freshness, and present a novel approach that maintains content freshness with least amount of web communication. Specifically, we assum...
Mohan Yang, Haixun Wang, Lipyeow Lim, Min Wang
Added 06 Dec 2010
Updated 06 Dec 2010
Type Conference
Year 2010
Where SIGMOD
Authors Mohan Yang, Haixun Wang, Lipyeow Lim, Min Wang
Comments (0)