Sciweavers

1166 search results - page 57 / 234
» Crash Management for Distributed Parallel Systems
Sort
View
CCGRID
2010
IEEE
14 years 3 months ago
WORKEM: Representing and Emulating Distributed Scientific Workflow Execution State
- Scientific workflows have become an integral part of cyberinfrastructure as their computational complexity and data sizes have grown. However, the complexity of the distributed i...
Lavanya Ramakrishnan, Dennis Gannon, Beth Plale
102
Voted
CCGRID
2010
IEEE
15 years 28 days ago
Designing Accelerator-Based Distributed Systems for High Performance
Abstract--Multi-core processors with accelerators are becoming commodity components for high-performance computing at scale. While accelerator-based processors have been studied in...
M. Mustafa Rafique, Ali Raza Butt, Dimitrios S. Ni...
ICDCS
2006
IEEE
15 years 5 months ago
SysProf: Online Distributed Behavior Diagnosis through Fine-grain System Monitoring
Runtime monitoring is key to the effective management of enterprise and high performance applications. To deal with the complex behaviors of today’s multi-tier applications runn...
Sandip Agarwala, Karsten Schwan
IPPS
2000
IEEE
15 years 4 months ago
Accommodating QoS Prediction in an Adaptive Resource Management Framework
Resource management for dynamic, distributed real-time systems requires handling of unknown arrival rates for data and events; additional desiderata include: accommodation of heter...
Eui-nam Huh, Lonnie R. Welch, Behrooz Shirazi, Bre...
HPDC
2007
IEEE
15 years 6 months ago
Direct-pNFS: scalable, transparent, and versatile access to parallel file systems
Grid computations require global access to massive data stores. To meet this need, the GridNFS project aims to provide scalable, high-performance, transparent, and secure wide-are...
Dean Hildebrand, Peter Honeyman