In this paper we describe a GPU parallelization of the 3D finite difference computation using CUDA. Data access redundancy is used as the metric to determine the optimal implement...
This paper presents Kato, a tool that implements a novel class of optimizations that are inspired by program slicing for imperative languages but are applicable to analyzable decl...
We develop an explicit two level system that allows programmers to reason about the behavior of effectful programs. The first level is an ordinary ML-style type system, which conf...
The problem of programming scalable multicore processors has renewed interest in message-passing languages and frameworks. Such languages and frameworks are typically actororiente...
The UTS benchmark is used to evaluate task parallelism in OpenMP 3.0 as implemented in a number of recently released compilers and run-time systems. UTS performs parallel search of...