The expanding and dynamic nature of the Web poses enormous challenges to most data mining techniques that try to extract patterns from Web data, such as Web usage and Web content....
To find near-duplicate documents, fingerprint-based paradigms such as Broder's shingling and Charikar's simhash algorithms have been recognized as effective approaches a...
The paper focuses on investigating the combined use of semantic and structural information of programs to support the comprehension tasks involved in the maintenance and reenginee...
Computing a suitable measure of consensus among several clusterings on the same data is an important problem that arises in several areas such as computational biology and data mi...
Piotr Berman, Bhaskar DasGupta, Ming-Yang Kao, Jie...
In this paper we propose a novel approach to content-based image retrieval with relevance feedback, which is based on the random walker algorithm introduced in the context of inte...