Automatic vectorization of programs for partitioned-ALU SIMD (Single Instruction Multiple Data) processors has been difficult because of not only data dependency issues but also n...
This paper presents a novel technique to perform global optimization of communication and preprocessing calls in the presence of array accesses with arbitrary subscripts. Our sche...
To support network programming, we present Deluge, a reliable data dissemination protocol for propagating large data objects from one or more source nodes to many other nodes over...
This paper presents a new functional parallel language: Minimally Synchronous Parallel ML. The execution time can then be estimated and dead-locks and indeterminism are avoided. I...
Applications composed of multiple parallel libraries perform poorly when those libraries interfere with one another by obliviously using the same physical cores, leading to destru...