The thesis of this research is that the task of exposing the parallelism in a given application should be left to the algorithm designer, who has intimate knowledge of the applica...
— We have simulated the implementation of 16-bit floating point instructions on a Pentium4 and PowerPC G4 and G5 to evaluate the performance impact of these instructions in embed...
Lionel Lacassagne, Daniel Etiemble, S. A. Ould Kab...
1 This paper presents a simple but effective method to reduce on-chip access latency and improve core isolation in CMP Non-Uniform Cache Architectures (NUCA). The paper introduces ...
Javier Merino, Valentin Puente, Pablo Prieto, Jos&...
Parallel random access memory, or PRAM, is a now venerable model of parallel computation that that still retains its usefulness for the design and analysis of parallel algorithms....
We present an optimized parallelization scheme for molecular dynamics simulations of large biomolecular systems, implemented in the production-quality molecular dynamics program N...
Robert Brunner, James C. Phillips, Laxmikant V. Ka...