Sciweavers

CLUSTER
2004
IEEE
13 years 8 months ago
FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI
As high performance clusters continue to grow in size, the mean time between failure shrinks. Thus, the issues of fault tolerance and reliability are becoming one of the challengi...
Gengbin Zheng, Lixia Shi, Laxmikant V. Kalé
CLUSTER
2004
IEEE
13 years 8 months ago
Fault-tolerant grid services using primary-backup: feasibility and performance
The combination of Grid technology and web services has produced an attractive platform for deploying distributed applications: Grid services, as represented by the Open Grid Serv...
Xianan Zhang, Dmitrii Zagorodnov, Matti A. Hiltune...
CLUSTER
2004
IEEE
13 years 8 months ago
Scalable, high-performance NIC-based all-to-all broadcast over Myrinet/GM
All-to-all broadcast is one of the common collective operations that involve dense communication between all processes in a parallel program. Previously, programmable Network Inte...
Weikuan Yu, Dhabaleswar K. Panda, Darius Buntinas
CLUSTER
2004
IEEE
13 years 8 months ago
An efficient end-host architecture for cluster communication
Cluster computing environments built from commodity hardware have provided a cost-effective solution for many scientific and high-performance applications. Likewise, middleware te...
Xin Qi, Gabriel Parmer, Richard West
CLUSTER
2004
IEEE
13 years 8 months ago
NIC-based offload of dynamic user-defined modules for Myrinet clusters
Many of the modern networks used to interconnect nodes in cluster-based computing systems provide network interface cards (NICs) that offer programmable processors. Substantial re...
Adam Wagner, Hyun-Wook Jin, Dhabaleswar K. Panda, ...
CLUSTER
2004
IEEE
13 years 8 months ago
Communicating efficiently on cluster based grids with MPICH-VMI
Emerging infrastructure of computational grids composed of Clusters-of-Clusters (CoC) interlinked through high throughput channels promises unprecedented raw compute power for ter...
Avneesh Pant, Hassan Jafri
CLUSTER
2004
IEEE
13 years 8 months ago
An evaluation of the close-to-files processor and data co-allocation policy in multiclusters
In multicluster systems, and more generally, in grids, jobs may require co-allocation, i.e., the simultaneous allocation of resources such as processors and input files in multipl...
Hashim H. Mohamed, Dick H. J. Epema
CLUSTER
2004
IEEE
13 years 8 months ago
A comparison of local and gang scheduling on a Beowulf cluster
Gang Scheduling and related techniques are widely believed to be necessary for efficientjob scheduling on distributed memory parallel computers. This is hecause they minimize cont...
Peter E. Strazdins, John Uhlmann
CLUSTER
2004
IEEE
13 years 8 months ago
Towards informatic analysis of Syslogs
The complexity and cost of isolating the root cause of system problems in large parallel computers generally scales with the size of the system. Syslog messages provide a primary ...
John Stearley