Sciweavers

5171 search results - page 544 / 1035
» Deterministic Parallel Processing
Sort
View
PVM
2010
Springer
15 years 3 months ago
Toward Performance Models of MPI Implementations for Understanding Application Scaling Issues
Abstract. Designing and tuning parallel applications with MPI, particularly at large scale, requires understanding the performance implications of different choices of algorithms ...
Torsten Hoefler, William Gropp, Rajeev Thakur, Jes...
IPPS
2010
IEEE
15 years 2 months ago
Optimization of linked list prefix computations on multithreaded GPUs using CUDA
We present a number of optimization techniques to compute prefix sums on linked lists and implement them on multithreaded GPUs using CUDA. Prefix computations on linked structures ...
Zheng Wei, Joseph JáJá
SASP
2009
IEEE
291views Hardware» more  SASP 2009»
15 years 11 months ago
FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs
— As growing power dissipation and thermal effects disrupted the rising clock frequency trend and threatened to annul Moore’s law, the computing industry has switched its route...
Alexandros Papakonstantinou, Karthik Gururaj, John...
ICPP
2009
IEEE
15 years 11 months ago
Accelerating Checkpoint Operation by Node-Level Write Aggregation on Multicore Systems
—Clusters and applications continue to grow in size while their mean time between failure (MTBF) is getting smaller. Checkpoint/Restart is becoming increasingly important for lar...
Xiangyong Ouyang, Karthik Gopalakrishnan, Dhabales...
IPPS
2009
IEEE
15 years 11 months ago
Accelerating HMMer on FPGAs using systolic array based architecture
HMMer is a widely-used bioinformatics software package that uses profile HMMs (Hidden Markov Models) to model the primary structure consensus of a family of protein or nucleic aci...
Yanteng Sun, Peng Li, Guochang Gu, Yuan Wen, Yuan ...