We tackle the challenging problem of mining the simplest Boolean patterns from categorical datasets. Instead of complete enumeration, which is typically infeasible for this class ...
Algorithms based on simulating stochastic flows are a simple and natural solution for the problem of clustering graphs, but their widespread use has been hampered by their lack of...
Various real-life datasets can be viewed as a set of records consisting of attributes explaining the records and set of measures evaluating the records. In this paper, we address t...
Data perturbation is a popular technique for privacypreserving data mining. The major challenge of data perturbation is balancing privacy protection and data quality, which are no...
Most topic models, such as latent Dirichlet allocation, rely on the bag-of-words assumption. However, word order and phrases are often critical to capturing the meaning of text in...