Sciweavers

35 search results - page 4 / 7
» Loop Scheduling and Partitions for Hiding Memory Latencies
Sort
View
ICPP
1998
IEEE
13 years 10 months ago
A memory-layout oriented run-time technique for locality optimization
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...
Yong Yan, Xiaodong Zhang, Zhao Zhang
ISCA
2010
IEEE
185views Hardware» more  ISCA 2010»
13 years 11 months ago
Dynamic warp subdivision for integrated branch and memory divergence tolerance
SIMD organizations amortize the area and power of fetch, decode, and issue logic across multiple processing units in order to maximize throughput for a given area and power budget...
Jiayuan Meng, David Tarjan, Kevin Skadron
ICS
2007
Tsinghua U.
14 years 9 days ago
Automatic nonblocking communication for partitioned global address space programs
Overlapping communication with computation is an important optimization on current cluster architectures; its importance is likely to increase as the doubling of processing power ...
Wei-Yu Chen, Dan Bonachea, Costin Iancu, Katherine...
MICRO
2002
IEEE
143views Hardware» more  MICRO 2002»
13 years 11 months ago
Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor
Clustering is a common technique to overcome the wire delay problem incurred by the evolution of technology. Fully-distributed architectures, where the register file, the functio...
Enric Gibert, F. Jesús Sánchez, Anto...
DAC
2000
ACM
14 years 7 months ago
Memory aware compilation through accurate timing extraction
Memory delays represent a major bottleneck in embedded systems performance. Newer memory modules exhibiting efficient access modes (e.g., page-, burst-mode) partly alleviate this ...
Peter Grun, Nikil D. Dutt, Alexandru Nicolau