Assessing the similarity between objects is a prerequisite for many data mining techniques. This paper introduces a novel approach to learn distance functions that maximizes the c...
Christoph F. Eick, Alain Rouhana, Abraham Bagherje...
Many applications dealing with textual information require classification of words into semantic classes (or concepts). However, manually constructing semantic classes is a tediou...
We present an extension of the fuzzy c-Means algorithm, which operates simultaneously on different feature spaces—so-called parallel universes—and also incorporates noise det...
A number of real-world domains such as social networks and e-commerce involve heterogeneous data that describes relations between multiple classes of entities. Understanding the n...
Sampling is a widely used technique to increase efficiency in database and data mining applications operating on large dataset. In this paper we present a scalable sampling imple...