Super-Scalable Algorithms for Computing on 100, 000 Processors

13 years 10 months ago

Download www.csm.ornl.gov

In the next ﬁve years, the number of processors in high-end systems for scientiﬁc computing is expected to rise to tens and even hundreds of thousands. For example, the IBM Blue Gene/L can have up to 128,000 processors and the delivery of the ﬁrst system is scheduled for 2005. Existing deﬁciencies in scalability and fault-tolerance of scientiﬁc applications need to be addressed soon. If the number of processors grows by a magnitude and eﬃciency drops by a magnitude, the overall eﬀective computing performance stays the same. Furthermore, the mean time to interrupt of high-end computer systems decreases with scale and complexity. In a 100,000-processor system, failures may occur every couple of minutes and traditional checkpointing may no longer be feasible. With this paper, we summarize our recent research in super-scalable algorithms for computing on 100,000 processors. We introduce the algorithm properties of scale invariance and natural fault tolerance, and discuss how ...

Christian Engelmann, Al Geist

Real-time Traffic