Sciweavers

172 search results - page 11 / 35
» Managing large datasets with iRODS - a performance analyses
Sort
View
ISPASS
2010
IEEE
15 years 4 months ago
The Hadoop distributed filesystem: Balancing portability and performance
—Hadoop is a popular open-source implementation of MapReduce for the analysis of large datasets. To manage storage resources across the cluster, Hadoop uses a distributed user-le...
Jeffrey Shafer, Scott Rixner, Alan L. Cox
PAKDD
2009
ACM
149views Data Mining» more  PAKDD 2009»
15 years 2 months ago
A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data
Detecting outliers which are grossly different from or inconsistent with the remaining dataset is a major challenge in real-world KDD applications. Existing outlier detection met...
Ke Zhang, Marcus Hutter, Huidong Jin
GECCO
2007
Springer
144views Optimization» more  GECCO 2007»
15 years 1 months ago
The reliability of confidence intervals for computational effort comparisons
This paper analyses the reliability of confidence intervals for Koza's computational effort statistic. First, we conclude that dependence between the observed minimum generat...
Matthew Walker, Howard Edwards, Chris H. Messom
CORR
2008
Springer
185views Education» more  CORR 2008»
14 years 9 months ago
Realizing Fast, Scalable and Reliable Scientific Computations in Grid Environments
The practical realization of managing and executing large scale scientific computations efficiently and reliably is quite challenging. Scientific computations often invo...
Yong Zhao, Ioan Raicu, Ian T. Foster, Mihael Hateg...
SIGIR
2009
ACM
15 years 4 months ago
What queries are likely to recur in web search?
We study the recurrence dynamics of queries in Web search by analysing a large real-world query log dataset. We find that query frequency is more useful in predicting collective ...
Dell Zhang, Jinsong Lu