Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to bo...
Jiang-Ming Yang, Rui Cai, Yida Wang, Jun Zhu, Lei ...
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
The term Semantic Web was coined by Tim Berners-Lee to describe his proposal for \a web of meaning," as opposed to the \web of links" that currently exists on the Intern...
In a distributed storage system, client caches managed on the basis of small granularity objects can provide better memory utilization then page-based caches. However, object serv...
The Web has become a ubiquitous tool for distributing knowledge and information and for conducting businesses. To exploit the huge potential of the Web as a global information rep...