Sciweavers

1166 search results - page 29 / 234
» Crash Management for Distributed Parallel Systems
Sort
View
HPDC
1998
IEEE
15 years 4 months ago
Matchmaking: Distributed Resource Management for High Throughput Computing
Conventional resource management systems use a system model to describe resources and a centralized scheduler to control their allocation. We argue that this paradigm does not ada...
Rajesh Raman, Miron Livny, Marvin H. Solomon
IPPS
2005
IEEE
15 years 5 months ago
End-to-End Quality of Service Management for Distributed Real-Time Embedded Applications
Many of the world’s most critical systems are distributed real-time embedded (DRE) systems, with missioncritical quality of service (QoS) requirements. However, because of their...
Prakash Manghwani, Joseph P. Loyall, Praveen Kaush...
GRID
2000
Springer
15 years 3 months ago
MeSch - An Approach to Resource Management in a Distributed Environment
Resource management in the typical Grid environment based on multi-MPP systems or clusters today still is one of the challenging problems. We will present MeSch, a solution for the...
Gerd Quecke, Wolfgang Ziegler
CLUSTER
2007
IEEE
14 years 11 months ago
The computer as software component: A mechanism for developing and testing resource management software
— In this paper, we present an architecture that encapsulates system hardware inside a software component used for job execution and status monitoring. The development of this in...
Narayan Desai, Theron Voran, Ewing L. Lusk, Andrew...
ICPPW
2008
IEEE
15 years 6 months ago
Simulating Failures on Large-Scale Systems
—Developing fault management mechanisms is a difficult task because of the unpredictable nature of failures. In this paper, we present a fault simulation framework for Blue Gene...
Narayan Desai, Ewing L. Lusk, Daniel Buettner, And...