—Hadoop is a popular open-source implementation of MapReduce for the analysis of large datasets. To manage storage resources across the cluster, Hadoop uses a distributed user-le...
Detecting outliers which are grossly different from or inconsistent with the remaining dataset is a major challenge in real-world KDD applications. Existing outlier detection met...
This paper analyses the reliability of confidence intervals for Koza's computational effort statistic. First, we conclude that dependence between the observed minimum generat...
The practical realization of managing and executing large scale scientific computations efficiently and reliably is quite challenging. Scientific computations often invo...
Yong Zhao, Ioan Raicu, Ian T. Foster, Mihael Hateg...
We study the recurrence dynamics of queries in Web search by analysing a large real-world query log dataset. We find that query frequency is more useful in predicting collective ...