Sciweavers

35 search results - page 6 / 7
» Loop Scheduling and Partitions for Hiding Memory Latencies
Sort
View
MICRO
1997
IEEE
116views Hardware» more  MICRO 1997»
13 years 9 months ago
Tuning Compiler Optimizations for Simultaneous Multithreading
Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. For example, when targeting shared-mem...
Jack L. Lo, Susan J. Eggers, Henry M. Levy, Sujay ...
ISCA
2003
IEEE
110views Hardware» more  ISCA 2003»
13 years 10 months ago
Guided Region Prefetching: A Cooperative Hardware/Software Approach
Despite large caches, main-memory access latencies still cause significant performance losses in many applications. Numerous hardware and software prefetching schemes tolerate th...
Zhenlin Wang, Doug Burger, Steven K. Reinhardt, Ka...
PC
2010
190views Management» more  PC 2010»
13 years 3 months ago
High-performance cone beam reconstruction using CUDA compatible GPUs
Compute unified device architecture (CUDA) is a software development platform that allows us to run C-like programs on the nVIDIA graphics processing unit (GPU). This paper prese...
Yusuke Okitsu, Fumihiko Ino, Kenichi Hagihara
PPOPP
2006
ACM
13 years 11 months ago
High-performance IPv6 forwarding algorithm for multi-core and multithreaded network processor
IP forwarding is one of the main bottlenecks in Internet backbone routers, as it requires performing the longest-prefix match at 10Gbps speed or higher. IPv6 forwarding further ex...
Xianghui Hu, Xinan Tang, Bei Hua
JUCS
2000
120views more  JUCS 2000»
13 years 5 months ago
Execution and Cache Performance of the Scheduled Dataflow Architecture
: This paper presents an evaluation of our Scheduled Dataflow (SDF) Processor. Recent focus in the field of new processor architectures is mainly on VLIW (e.g. IA-64), superscalar ...
Krishna M. Kavi, Joseph Arul, Roberto Giorgi