Large volume public comment campaigns and web portals that encourage the public to customize form letters produce many near-duplicate documents, which increases processing and sto...
Text reuse occurs in many different types of documents and for many different reasons. One form of reuse, duplicate or near-duplicate documents, has been a focus of researchers be...
Near-duplicate keyframes (NDK) play a unique role in large-scale video search, news topic detection and tracking. In this paper, we propose a novel NDK retrieval approach by explo...
In this paper, we study the problem of Web forum crawling. Web forum has now become an important data source of many Web applications; while forum crawling is still a challenging ...
Yida Wang, Jiang-Ming Yang, Wei Lai, Rui Cai, Lei ...
We consider the use of meta-data and/or video-domain methods to detect similar videos on the web. Meta-data is extracted from the textual and hyperlink information associated with...