Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional d...
Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto...
— Quantiles are very useful in characterizing the data distribution of an evolving dataset in the process of data mining or network monitoring. The method of Stochastic Approxima...
Abstract--Large high dimension datasets are of growing importance in many fields and it is important to be able to visualize them for understanding the results of data mining appro...
Jong Youl Choi, Seung-Hee Bae, Xiaohong Qiu, Geoff...
Most previously proposed frequent graph mining algorithms are intended to find the complete set of all frequent, closed subgraphs. However, in many cases only a subset of the freq...
Network, web, and disk I/O traffic are usually bursty, self-similar [9, 3, 5, 6] and therefore can not be modeled adequately with Poisson arrivals[9]. However, we do want to model...
Mengzhi Wang, Ngai Hang Chan, Spiros Papadimitriou...