The LOCKSS project has developed and deployed in a worldwide test a peer-to-peer system for preserving access to journals and other archival information published on the Web. It c...
Petros Maniatis, David S. H. Rosenthal, Mema Rouss...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
The large amount of available information on the Web makes it hard for users to locate resources about particular topics of interest. Traditional search tools, e.g., search engines...
Nowadays web spamming has emerged to take the economic advantage of high search rankings and threatened the accuracy and fairness of those rankings. Understanding spamming techniq...
There has been great interest in applying Semantic Web technologies to the tourism sector ever since Tim Berners-Lee introduced his vision. Unfortunately, there is a major obstacl...