Sciweavers

146 search results - page 29 / 30
» Transparent Checkpoint-Restart of Distributed Applications o...
Sort
View
SC
2009
ACM
14 years 3 days ago
FALCON: a system for reliable checkpoint recovery in shared grid environments
In Fine-Grained Cycle Sharing (FGCS) systems, machine owners voluntarily share their unused CPU cycles with guest jobs, as long as the performance degradation is tolerable. For gu...
Tanzima Zerin Islam, Saurabh Bagchi, Rudolf Eigenm...
CLUSTER
2005
IEEE
13 years 11 months ago
Memory Management Support for Multi-Programmed Remote Direct Memory Access (RDMA) Systems
Current operating systems offer basic support for network interface controllers (NICs) supporting remote direct memory access (RDMA). Such support typically consists of a device d...
Kostas Magoutis
CCGRID
2002
IEEE
13 years 10 months ago
Overcoming the Problems Associated with the Existence of Too Many DSM APIs
Despite the large research efforts in the SW–DSM community, this technology has not yet been adapted widely for significant codes beyond benchmark suites. One of the reasons co...
Martin Schulz
KDD
2009
ACM
198views Data Mining» more  KDD 2009»
14 years 5 months ago
Pervasive parallelism in data mining: dataflow solution to co-clustering large and sparse Netflix data
All Netflix Prize algorithms proposed so far are prohibitively costly for large-scale production systems. In this paper, we describe an efficient dataflow implementation of a coll...
Srivatsava Daruru, Nena M. Marin, Matt Walker, Joy...
ICS
2007
Tsinghua U.
13 years 11 months ago
Automatic nonblocking communication for partitioned global address space programs
Overlapping communication with computation is an important optimization on current cluster architectures; its importance is likely to increase as the doubling of processing power ...
Wei-Yu Chen, Dan Bonachea, Costin Iancu, Katherine...