Effective use of communication networks is critical to the performance and scalability of parallel applications. Partitioned Global Address Space languages like UPC bring the pro...
As the number of I/O-intensive MPI programs becomes increasingly large, many efforts have been made to improve I/O performance, on both software and architecture sides. On the sof...
We present Build Analyzer, a tool that helps developers optimize the build performance of huge systems written in C. Due to complex C header dependencies, even small code changes ...
Designers of concurrent programs are faced with many choices of synchronization mechanisms, among which clear functional trade-offs exist. Making synchronization customizable is h...
Using FPGAs to accelerate High Performance Computing (HPC) applications is attractive, but has a huge associated cost: the time spent, not for developing efficient FPGA code but fo...