Background: The traditional (unweighted) k-means is one of the most popular clustering methods for analyzing gene expression data. However, it suffers three major shortcomings. It...
Essentially all data mining algorithms assume that the datagenerating process is independent of the data miner's activities. However, in many domains, including spam detectio...
Nilesh N. Dalvi, Pedro Domingos, Mausam, Sumit K. ...
Procedures for collective inference make simultaneous statistical judgments about the same variables for a set of related data instances. For example, collective inference could b...
In the realm of data mining, several key issues exists in the traditional classification algorithms, such as low readability, large rule number, and low accuracy with information ...
Traditionally, research in identifying structured entities in documents has proceeded independently of document categorization research. In this paper, we observe that these two t...