Sciweavers

1166 search results - page 29 / 234
» Crash Management for Distributed Parallel Systems
Sort
View
HPDC
1998
IEEE
15 years 9 months ago
Matchmaking: Distributed Resource Management for High Throughput Computing
Conventional resource management systems use a system model to describe resources and a centralized scheduler to control their allocation. We argue that this paradigm does not ada...
Rajesh Raman, Miron Livny, Marvin H. Solomon
IPPS
2005
IEEE
15 years 10 months ago
End-to-End Quality of Service Management for Distributed Real-Time Embedded Applications
Many of the world’s most critical systems are distributed real-time embedded (DRE) systems, with missioncritical quality of service (QoS) requirements. However, because of their...
Prakash Manghwani, Joseph P. Loyall, Praveen Kaush...
GRID
2000
Springer
15 years 8 months ago
MeSch - An Approach to Resource Management in a Distributed Environment
Resource management in the typical Grid environment based on multi-MPP systems or clusters today still is one of the challenging problems. We will present MeSch, a solution for the...
Gerd Quecke, Wolfgang Ziegler
CLUSTER
2007
IEEE
15 years 4 months ago
The computer as software component: A mechanism for developing and testing resource management software
— In this paper, we present an architecture that encapsulates system hardware inside a software component used for job execution and status monitoring. The development of this in...
Narayan Desai, Theron Voran, Ewing L. Lusk, Andrew...
ICPPW
2008
IEEE
15 years 11 months ago
Simulating Failures on Large-Scale Systems
—Developing fault management mechanisms is a difficult task because of the unpredictable nature of failures. In this paper, we present a fault simulation framework for Blue Gene...
Narayan Desai, Ewing L. Lusk, Daniel Buettner, And...