Sciweavers

4198 search results - page 232 / 840
» Data Parallel Program Design
Sort
View
PPOPP
2010
ACM
16 years 26 days ago
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?
Most modern Chip Multiprocessors (CMP) feature shared cache on chip. For multithreaded applications, the sharing reduces communication latency among co-running threads, but also r...
Eddy Z. Zhang, Xipeng Shen, Yunlian Jiang
121
Voted
SPAA
2003
ACM
15 years 8 months ago
Throughput-centric routing algorithm design
The increasing application space of interconnection networks now encompasses several applications, such as packet routing and I/O interconnect, where the throughput of a routing a...
Brian Towles, William J. Dally, Stephen P. Boyd
FPL
2009
Springer
99views Hardware» more  FPL 2009»
15 years 8 months ago
Exploiting fast carry-chains of FPGAs for designing compressor trees
Fast carry chains featuring dedicated adder circuitry is a distinctive feature of modern FPGAs. The carry chains bypass the general routing network and are embedded in the logic b...
Hadi Parandeh-Afshar, Philip Brisk, Paolo Ienne
131
Voted
IPPS
2006
IEEE
15 years 9 months ago
A framework to develop symbolic performance models of parallel applications
Performance and workload modeling has numerous uses at every stage of the high-end computing lifecycle: design, integration, procurement, installation and tuning. Despite the trem...
Sadaf R. Alam, Jeffrey S. Vetter
129
Voted
CGF
2010
105views more  CGF 2010»
15 years 3 months ago
Streaming-Enabled Parallel Dataflow Architecture for Multicore Systems
We propose a new framework design for exploiting multi-core architectures in the context of visualization dataflow systems. Recent hardware advancements have greatly increased the...
Huy T. Vo, Daniel K. Osmari, Brian Summa, Jo&atild...