—This paper proposes a parallelization scheme for parameter sweep (PS) applications using the compute unified device architecture (CUDA). Our scheme focuses on PS applications w...
— Applications raising in many scientific fields exhibit both data and task parallelism that have to be exploited efficiently. A classic approach is to structure those applica...
Distributing data is a fundamental problem in implementing efficient distributed-memory parallel programs. The problem becomes more difficult in environments where the participa...
D. Brent Weatherly, David K. Lowenthal, Mario Naka...
Recent digital signal processors (DSPs) show a homogeneous VLIW-like data path architecture, which allows C compilers to generate efficient code. However, still some special rest...
There is a class of sparse matrix computations, such as direct solvers of systems of linear equations, that change the fill-in (nonzero entries) of the coefficient matrix, and invo...