Sciweavers

365 search results - page 70 / 73
» Automatic Performance Debugging of SPMD Parallel Programs
Sort
View
CODES
2005
IEEE
15 years 3 months ago
SOMA: a tool for synthesizing and optimizing memory accesses in ASICs
Arbitrary memory dependencies and variable latency memory systems are major obstacles to the synthesis of large-scale ASIC systems in high-level synthesis. This paper presents SOM...
Girish Venkataramani, Tiberiu Chelcea, Seth Copen ...
HPCA
2009
IEEE
15 years 4 months ago
Reconciling specialization and flexibility through compound circuits
While parallelism and multi-cores are receiving much attention as a major scalability path, customization is another, orthogonal and complementary, scalability path which can targ...
Sami Yehia, Sylvain Girbal, Hugues Berry, Olivier ...
HPCA
2011
IEEE
14 years 1 months ago
HAQu: Hardware-accelerated queueing for fine-grained threading on a chip multiprocessor
Queues are commonly used in multithreaded programs for synchronization and communication. However, because software queues tend to be too expensive to support finegrained paralle...
Sanghoon Lee, Devesh Tiwari, Yan Solihin, James Tu...
ISCA
2011
IEEE
313views Hardware» more  ISCA 2011»
14 years 1 months ago
FabScalar: composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template
A growing body of work has compiled a strong case for the single-ISA heterogeneous multi-core paradigm. A single-ISA heterogeneous multi-core provides multiple, differently-design...
Niket Kumar Choudhary, Salil V. Wadhavkar, Tanmay ...
ICS
2009
Tsinghua U.
15 years 2 months ago
A translation system for enabling data mining applications on GPUs
Modern GPUs offer much computing power at a very modest cost. Even though CUDA and other related recent developments are accelerating the use of GPUs for general purpose applicati...
Wenjing Ma, Gagan Agrawal