In recent years, the blogosphere has experienced a substantial increase in the number of posts published daily, forcing users to cope with information overload. The task of guidin...
Matching records that refer to the same entity across databases is becoming an increasingly important part of many data mining projects, as often data from multiple sources needs ...
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
This work introduces a new family of link-based dissimilarity measures between nodes of a weighted directed graph. This measure, called the randomized shortest-path (RSP) dissimil...
Luh Yen, Marco Saerens, Amin Mantrach, Masashi Shi...
Frequent patterns provide solutions to datasets that do not have well-structured feature vectors. However, frequent pattern mining is non-trivial since the number of unique patter...
Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Y...