Efficiently using the hardware capabilities of the Cell processor, a heterogeneous chip multiprocessor that uses several levels of parallelism to deliver high performance, and bei...
Tarik Saidani, Joel Falcou, Claude Tadonki, Lionel...
Traditional code optimizers have produced significant performance improvements over the past forty years. While promising avenues of research still exist, traditional static and p...
Jason Hiser, Naveen Kumar, Min Zhao, Shukang Zhou,...
One of the outcomes of DARPA’s HPCS program has been the creation of three new high productivity languages: Chapel, Fortress, and X10. While these languages have introduced impro...
Effective use of communication networks is critical to the performance and scalability of parallel applications. Partitioned Global Address Space languages like UPC bring the pro...
High-accuracy PDE solvers use multi-dimensional fast Fourier transforms. The FFTs exhibits a static and structured memory access pattern which results in a large amount of communic...