Address standardization is a very challenging task in data cleansing. To provide better customer relationship management and business intelligence for customer-oriented cooperates...
This work introduces a new family of link-based dissimilarity measures between nodes of a weighted directed graph. This measure, called the randomized shortest-path (RSP) dissimil...
Luh Yen, Marco Saerens, Amin Mantrach, Masashi Shi...
: A broad variety of data is available in distinct heterogeneous sources, stored under different formats: database formats (in relational and object-oriented models), document form...
In this paper, we study the problem of constructing private classifiers using decision trees, within the framework of differential privacy. We first construct privacy-preserving ID...
Textual patterns have been used effectively to extract information from large text collections. However they rely heavily on textual redundancy in the sense that facts have to be m...