A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Abstract. The sharing of association rules is often beneficial in industry, but requires privacy safeguards. One may decide to disclose only part of the knowledge and conceal stra...
We introduce a novel data mining technique for the analysis of gene expression. Gene expression is the effective production of the protein that a gene encodes. We focus on the cha...
Aleksandar Icev, Carolina Ruiz, Elizabeth F. Ryder
In many application domains (e.g., WWW mining, molecular biology), large string datasets are available and yet under-exploited. The inductive database framework assumes that both s...
Recent advances in linear classification have shown that for applications such as document classification, the training can be extremely efficient. However, most of the existing t...