As the World Wide Web in China grows rapidly, mining knowledge in Chinese Web pages becomes more and more important. Mining Web information usually relies on the machine learning ...
Easy reuse and integration of declaratively described information in a distributed setting is one of the main motivations for building the Semantic Web. Despite of this claim, reu...
We prove lower bounds on the redundancy necessary to represent a set S of objects using a number of bits close to the information-theoretic minimum log2 |S|, while answering vario...
Matching records that refer to the same entity across databases is becoming an increasingly important part of many data mining projects, as often data from multiple sources needs ...
Re-identification is a major privacy threat to public datasets containing individual records. Many privacy protection algorithms rely on generalization and suppression of "qu...