— One of the most prominent data quality problems is the existence of duplicate records. Current data cleaning systems usually produce one clean instance (repair) of the input da...
George Beskales, Mohamed A. Soliman, Ihab F. Ilyas...
—In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To thi...
Odysseas Papapetrou, Sukriti Ramesh, Stefan Siersd...
The phenomenal growth of video on the web and the increasing sparseness of meta information associated with it forces us to look for signals from the video content for search/info...
Ming Zhao 0003, Jay Yagnik, Hartwig Adam, David Ba...