Sciweavers

2016 search results - page 253 / 404
» Distributed error confinement
Sort
View
MIDDLEWARE
2007
Springer
15 years 11 months ago
Using checkpointing to recover from poor multi-site parallel job scheduling decisions
Recent research in multi-site parallel job scheduling leverages user-provided estimates of job communication characteristics to effectively partition the job across multiple clus...
William M. Jones
138
Voted
DATE
2006
IEEE
153views Hardware» more  DATE 2006»
15 years 11 months ago
Analyzing timing uncertainty in mesh-based clock architectures
Mesh architectures are used to distribute critical global signals on a chip, such as clock and power/ground. Redundancy created by mesh loops smooths out undesirable variations be...
Subodh M. Reddy, Gustavo R. Wilke, Rajeev Murgai
147
Voted
GLOBECOM
2006
IEEE
15 years 11 months ago
Service Differentiation by a Link Layer Protocol Based on SR ARQ over a Satellite Channel
— This paper studies the case where multiple IP flows are aggregated over a single satellite channel and an error recovery by retransmissions is performed by SelectiveRepeat (SR...
Toshihiro Shikama, Takashi Watanabe, Tadanori Mizu...
HPDC
2006
IEEE
15 years 11 months ago
ALPS: An Application-Level Proportional-Share Scheduler
ALPS is a per-application user-level proportional-share scheduler that operates with low overhead and without any special kernel support. ALPS is useful to a range of applications...
Travis Newhouse, Joseph Pasquale
ICPP
2006
IEEE
15 years 11 months ago
A Performance Model of the Krak Hydrodynamics Application
We present an analytic performance model of a largescale hydrodynamics code developed at Los Alamos National Laboratory. This modeling work is part of an ongoing effort to develop...
Kevin J. Barker, Scott Pakin, Darren J. Kerbyson