Large-scale individual-based simulations can benefit a lot from high performance computing environments. The benefit that can be hopped depends greatly on a good load distributi...
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar c...
Abstract-- Mapping mashups are emerging Web 2.0 applications in which data objects such as blogs, photos and videos from different sources are combined and marked in a map using AP...
Anthony K. H. Tung, Beng Chin Ooi, Dongxiang Zhang
Modeling query concepts through term dependencies has been shown to have a significant positive effect on retrieval performance, especially for tasks such as web search, where rel...