Sciweavers

914 search results - page 39 / 183
» Assessing the performance limits of parallelized near-thresh...
Sort
View
97
Voted
ASAP
2008
IEEE
82views Hardware» more  ASAP 2008»
15 years 7 months ago
Run-time thread sorting to expose data-level parallelism
We address the problem of data parallel processing for computational quantum chemistry (CQC). CQC is a computationally demanding tool to study the electronic structure of molecule...
Tirath Ramdas, Gregory K. Egan, David Abramson, Ki...
CCGRID
2010
IEEE
15 years 1 months ago
Designing Accelerator-Based Distributed Systems for High Performance
Abstract--Multi-core processors with accelerators are becoming commodity components for high-performance computing at scale. While accelerator-based processors have been studied in...
M. Mustafa Rafique, Ali Raza Butt, Dimitrios S. Ni...
90
Voted
IPPS
2010
IEEE
14 years 10 months ago
Large-scale multi-dimensional document clustering on GPU clusters
Document clustering plays an important role in data mining systems. Recently, a flocking-based document clustering algorithm has been proposed to solve the problem through simulat...
Yongpeng Zhang, Frank Mueller, Xiaohui Cui, Thomas...
ICPP
2007
IEEE
15 years 6 months ago
CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters
Performance and power are critical design constraints in today’s high-end computing systems. Reducing power consumption without impacting system performance is a challenge for t...
Rong Ge, Xizhou Feng, Wu-chun Feng, Kirk W. Camero...
ICPP
1999
IEEE
15 years 4 months ago
Impact on Performance of Fused Multiply-Add Units in Aggressive VLIW Architectures
Loops are the main time consuming part of programs based on floating point computations. The performance of the loops is limited either by recurrences in the computation or by the...
David López, Josep Llosa, Eduard Ayguad&eac...