A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Many have speculated that classifying web pages can improve a search engine's ranking of results. Intuitively results should be more relevant when they match the class of a q...
Paul N. Bennett, Krysta Marie Svore, Susan T. Duma...
We propose novel algorithms for organizing large image and video datasets using both the visual content and the associated sideinformation, such as time, location, authorship, and...
Abstract. "Web 2.0" is a term frequently mentioned in media - apparently, applications such as Wikipedia, Social Network Services, Online Shops with integrated recommende...
Recent years have seen a new generation of `digital students' emerging in the developed world. Digital students are young adults who have grown up with digital technologies i...