We consider the problem of finding duplicates in data streams. Duplicate detection in data streams is utilized in various applications including fraud detection. We develop a solu...
Maximizing only the relevance between queries and documents will not satisfy users if they want the top search results to present a wide coverage of topics by a few representative...
Yi Liu, Benyu Zhang, Zheng Chen, Michael R. Lyu, W...
In many Web search applications, similarities between objects of one type (say, queries) can be affected by the similarities between their interrelated objects of another type (sa...
Ordering and ranking items of different types are important tasks in various applications, such as query processing and scientific data mining. A total order for the items can be ...
Redescription mining is a newly introduced data mining problem that seeks to find subsets of data that afford multiple definitions. It can be viewed as a generalization of associa...