As the size of available datasets in various domains is growing rapidly, there is an increasing need for scaling data mining implementations. Coupled with the current trends in co...
Abstract--Randomization is a general technique for evaluating the significance of data analysis results. In randomizationbased significance testing, a result is considered to be in...
This paper addresses Named Entity Mining (NEM), in which we mine knowledge about named entities such as movies, games, and books from a huge amount of data. NEM is potentially use...
: Geographic information (e.g., locations, networks, and nearest neighbors) are unique and different from other aspatial attributes (e.g., population, sales, or income). It is a ch...
Mining High Utility Itemsets from a transaction database is to find itemsests that have utility above a user-specified threshold. This problem is an extension of Frequent Itemset ...