The relationship between constraint-based mining and constraint programming is explored by showing how the typical constraints used in pattern mining can be formulated for use in ...
Detecting outliers in a large set of data objects is a major data mining task aiming at finding different mechanisms responsible for different groups of objects in a data set. All...
Hans-Peter Kriegel, Matthias Schubert, Arthur Zime...
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Supervised classification methods have been shown to be very effective for a large number of applications. They require a training data set whose instances are labeled to indicate...
Efficient training of direct multi-class formulations of linear Support Vector Machines is very useful in applications such as text classification with a huge number examples as w...
S. Sathiya Keerthi, S. Sundararajan, Kai-Wei Chang...
Attributed graphs are increasingly more common in many application domains such as chemistry, biology and text processing. A central issue in graph mining is how to collect inform...
The BioJournalMonitor is a decision support system for the analysis of trends and topics in the biomedical literature. Its main goal is to identify potential diagnostic and therap...
This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated la...
Victor S. Sheng, Foster J. Provost, Panagiotis G. ...
This paper addresses several key issues in the ArnetMiner system, which aims at extracting and mining academic social networks. Specifically, the system focuses on: 1) Extracting ...
Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zha...
Query suggestion plays an important role in improving the usability of search engines. Although some recently proposed methods can make meaningful query suggestions by mining quer...