: In this paper, we will propose PC-Filter (PC stands for Partition Comparison), a robust data filter for approximately duplicate record detection in large databases. PC-Filter dis...
Ji Zhang, Tok Wang Ling, Robert M. Bruckner, Han L...
The rapid growth of the number of videos in YouTube provides enormous potential for users to find content of interest to them. Unfortunately, given the difficulty of searching vid...
Shumeet Baluja, Rohan Seth, D. Sivakumar, Yushi Ji...
In this paper, we describe a new approach for mining concept associations from large text collections. The concepts are short sequences of words that occur frequently together acr...
—Subspaces offer convenient means of representing information in many pattern recognition, machine vision, and statistical learning applications. Contrary to the growing populari...
Abstract. Many real world applications rely on the discovery of maximal biclique subgraphs (complete bipartite subgraphs). However, existing algorithms for enumerating maximal bicl...