Improving memory performance at software level is more effective in reducing the rapidly expanding gap between processor and memory performance. Loop transformations (e.g. loop un...
Surendra Byna, Xian-He Sun, William Gropp, Rajeev ...
We have designed, built, and analyzed a distributed parallel storage system that will supply image streams fast enough to permit multi-user, "real-time", video-like appl...
Brian Tierney, Jason Lee, Ling Tony Chen, Hanan He...
We present a new methodology for generating and adapting octree meshes for terascale applications. Our approach combines existing methods, such as parallel octree decomposition and...
Developing and debugging parallel programs particularly for distributed memory architectures is still a difficult task. The most popular approach to developing parallel programs f...
In this paper we propose a new approach for scheduling data parallel applications on the Grid using irregular array distributions. We implement the scheduler as a new case study fo...