Sciweavers

48 search results - page 7 / 10
» Tolerating data access latency with register preloading
Sort
View
ISCA
1996
IEEE
99views Hardware» more  ISCA 1996»
15 years 2 months ago
Coherent Network Interfaces for Fine-Grain Communication
Historically, processor accesses to memory-mapped device registers have been marked uncachable to insure their visibility to the device. The ubiquity of snooping cache coherence, ...
Shubhendu S. Mukherjee, Babak Falsafi, Mark D. Hil...
WMPI
2004
ACM
15 years 3 months ago
Scalable cache memory design for large-scale SMT architectures
The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past dec...
Muhamed F. Mudawar
MICRO
2002
IEEE
143views Hardware» more  MICRO 2002»
15 years 2 months ago
Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor
Clustering is a common technique to overcome the wire delay problem incurred by the evolution of technology. Fully-distributed architectures, where the register file, the functio...
Enric Gibert, F. Jesús Sánchez, Anto...
PCI
2005
Springer
15 years 3 months ago
Tuning Blocked Array Layouts to Exploit Memory Hierarchy in SMT Architectures
Cache misses form a major bottleneck for memory-intensive applications, due to the significant latency of main memory accesses. Loop tiling, in conjunction with other program tran...
Evangelia Athanasaki, Kornilios Kourtis, Nikos Ana...
IPPS
2009
IEEE
15 years 4 months ago
Implementing and evaluating multithreaded triad census algorithms on the Cray XMT
Commonly represented as directed graphs, social networks depict relationships and behaviors among social entities such as people, groups, and organizations. Social network analysi...
George Chin Jr., Andrès Márquez, Sut...