As part of the Language Observatory Project [4], we have been crawling all the web space since 2004. We have collected terabytes of data mostly from Asian and African ccTLDs. In t...
Rizza Camus Caminero, Pavol Zavarsky, Yoshiki Mika...
Understanding the availability of site metadata on the Web is a foundation for any system or application that wants to work with the pages published by Web sites, and also wants to...
Purpose – This paper reports the findings of a major study examining the overlap among results retrieved by three major web search engines. The goal of the research was to: mea...
Amanda Spink, Bernard J. Jansen, Chris Blakely, Sh...
Abstract. If one wants to have a scheme for identifying non-Web accessible entities, should it be centralized or decentralized? Given a URI, how can one tell if it refers to a web ...
Online forums contain valuable human-generated information. End-users looking for information would like to find only those threads in forums where relevant information is present...