Sciweavers

4198 search results - page 232 / 840
» Data Parallel Program Design
Sort
View
PPOPP
2010
ACM
16 years 2 months ago
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?
Most modern Chip Multiprocessors (CMP) feature shared cache on chip. For multithreaded applications, the sharing reduces communication latency among co-running threads, but also r...
Eddy Z. Zhang, Xipeng Shen, Yunlian Jiang
SPAA
2003
ACM
15 years 10 months ago
Throughput-centric routing algorithm design
The increasing application space of interconnection networks now encompasses several applications, such as packet routing and I/O interconnect, where the throughput of a routing a...
Brian Towles, William J. Dally, Stephen P. Boyd
131
Voted
FPL
2009
Springer
99views Hardware» more  FPL 2009»
15 years 9 months ago
Exploiting fast carry-chains of FPGAs for designing compressor trees
Fast carry chains featuring dedicated adder circuitry is a distinctive feature of modern FPGAs. The carry chains bypass the general routing network and are embedded in the logic b...
Hadi Parandeh-Afshar, Philip Brisk, Paolo Ienne
IPPS
2006
IEEE
15 years 11 months ago
A framework to develop symbolic performance models of parallel applications
Performance and workload modeling has numerous uses at every stage of the high-end computing lifecycle: design, integration, procurement, installation and tuning. Despite the trem...
Sadaf R. Alam, Jeffrey S. Vetter
CGF
2010
105views more  CGF 2010»
15 years 5 months ago
Streaming-Enabled Parallel Dataflow Architecture for Multicore Systems
We propose a new framework design for exploiting multi-core architectures in the context of visualization dataflow systems. Recent hardware advancements have greatly increased the...
Huy T. Vo, Daniel K. Osmari, Brian Summa, Jo&atild...