High performance LU factorization for non-dedicated clusters

15 years 7 months ago

Download web.yl.is.s.u-tokyo.ac.jp

This paper describes an implementation of parallel LU factorization. The focus is to achieve high performance on non-dedicated clusters, where the number of available computing resources may be arbitrary and even dynamically changing. We accommodate joining/leaving processes by describing the algorithm in the Phoenix programming model. We achieve high performance in this setting by a combination of techniques including a latency tolerant communication and data partitioning that achieves both load balance and small communication volume for arbitrary and dynamically changing number of processors. We observed 130 GFlops with 128 processes on a 70-node dual 2.4GHz Xeon cluster, at matrix size = 46,080. This performance is comparable to that of the High Performance Linpack (HPL). When cluster nodes are loaded by background processes, our implementation surpasses HPL.

Toshio Endo, Kenji Kaneda, Kenjiro Taura, Akinori

Real-time Traffic

CCGRID 2004 | Distributed And Parallel Computing | Implementation Surpasses Hpl | Latency Tolerant Communication | Parallel Lu Factorization |

claim paper

» Parallel sparse LU factorization on secondclass message passing platforms

» Updating an LU Factorization with Pivoting

» A Supernodal Approach to Incomplete LU Factorization with Partial Pivoting

» HypergraphBased Unsymmetric Nested Dissection Ordering for Sparse LU Factorization

» Elimination Forest Guided 2D Sparse LU Factorization

» HighPerformance and Parameterized Matrix Factorization on FPGAs

» Scalable Dense Factorizations for Heterogeneous Computational Clusters

» A high performance cluster JVM presenting a pure single system image

Post Info
More Details (n/a)

Added	20 Aug 2010
Updated	20 Aug 2010
Type	Conference
Year	2004
Where	CCGRID
Authors	Toshio Endo, Kenji Kaneda, Kenjiro Taura, Akinori Yonezawa

Comments (0)

Sciweavers

High performance LU factorization for non-dedicated clusters

CCGRID 2004 | Distributed And Parallel Computing | Implementation Surpasses Hpl | Latency Tolerant Communication | Parallel Lu Factorization |

Explore & Download

Productivity Tools

Sciweavers