As the scale is expanding, node failure becomes a commonplace feature of large-scale cluster systems. As an important part of cluster operating system software, job scheduling tak...
Linping Wu, Dan Meng, Jianfeng Zhan, Wang Lei, Bib...
The development of efficient parallel algorithms for large scale wildfire simulations is a challenging research problem because the factors that determine wildfire behavior are com...
Despite great efforts on the design of ultra-reliable components, the increase of system size and complexity has outpaced the improvement of component reliability. As a result, fa...
Jiexing Gu, Ziming Zheng, Zhiling Lan, John White,...
The High Performance Storage System (HPSS) provides scalable hierarchical storage management (HSM), archive, and file system services. Its design, implementation and current domin...
In programming high performance applications, shared address-space platforms are preferable for fine-grained computation, while distributed address-space platforms are more suita...