In this paper, we describe a compilation system that automates much of the process of performance tuning that is currently done manually by application programmers interested in h...
Nastaran Baradaran, Jacqueline Chame, Chun Chen, P...
We describe an object-oriented software integration frameccom, abstracted from our five years of experience in developing a complex, integrated code for rocket simulation. Roccom...
Xiangmin Jiao, Michael T. Campbell, Michael T. Hea...
Abstract. Finite volume numerical methods have been widely studied, implemented and parallelized on multiprocessor systems or on clusters. Modern graphics processing units (GPU) pr...
In contrast to the common belief that OpenMP requires data-parallel extensions to scale well on architectures with non-uniform memory access latency, recent work has shown that it ...
We present new architectural concepts for uniprocessor designs that conform to the data-driven computation paradigm. Usage of our D2 -CPU (Data-Driven processor) follows the natura...