Sciweavers

CDES
2006
89views Hardware» more  CDES 2006»
13 years 6 months ago
Autonomous Instruction Memory Equipped with Dynamic Branch Handling Capability
Memory accesses have always been a speed-limiting factor, and memory bandwidth has always been an intensively contended scarce resource. Nevertheless, with recent pervasive emergen...
Hui-Chin Yang, Chung-Ping Chung
ICPP
1991
IEEE
13 years 8 months ago
Two Techniques to Enhance the Performance of Memory Consistency Models
The memory consistency model supported by a multiprocessor directly affects its performance. Thus, several attempts have been made to relax the consistency models to allow for mor...
Kourosh Gharachorloo, Anoop Gupta, John L. Henness...
HICSS
1995
IEEE
109views Biometrics» more  HICSS 1995»
13 years 8 months ago
The architecture of an optimistic CPU: the WarpEngine
The architecture for a shared memory CPU is described. The CPU allows for parallelism down to the level of single instructions and is tolerant of memory latency. All executable in...
John G. Cleary, Murray Pearson, Husam Kinawi
CLUSTER
2004
IEEE
13 years 8 months ago
Predicting memory-access cost based on data-access patterns
Improving memory performance at software level is more effective in reducing the rapidly expanding gap between processor and memory performance. Loop transformations (e.g. loop un...
Surendra Byna, Xian-He Sun, William Gropp, Rajeev ...
ASPLOS
1992
ACM
13 years 8 months ago
Access Normalization: Loop Restructuring for NUMA Compilers
: In scalable parallel machines, processors can make local memory accesses much faster than they can make remote memory accesses. In addition, when a number of remote accesses must...
Wei Li, Keshav Pingali
EUROPAR
1997
Springer
13 years 8 months ago
Prefetching and Multithreading Performance in Bus-Based Multiprocessors with Petri Nets
The large latency of memory accesses is a major obstacle in obtaining high processor utilization in large scale shared-memory multiprocessors. Access to remote memory is likely to ...
Edward D. Moreno, Sergio Takeo Kofuji, Marcelo H. ...
SIGCOMM
1998
ACM
13 years 8 months ago
Fast and Scalable Layer Four Switching
In Layer Four switching, the route and resources allocated to a packet are determined by the destination address as well as other header elds of the packet such as source address,...
Venkatachary Srinivasan, George Varghese, Subhash ...
ISLPED
1999
ACM
100views Hardware» more  ISLPED 1999»
13 years 8 months ago
Selective instruction compression for memory energy reduction in embedded systems
We propose a technique for reducing the energy required by rmware code to execute on embedded systems. The method is based on the idea of compressing the most commonly executed in...
Luca Benini, Alberto Macii, Enrico Macii, Massimo ...
IPPS
2000
IEEE
13 years 9 months ago
A Mechanism for Speculative Memory Accesses Following Synchronizing Operations
In order to reduce the overhead of synchronizing operations of shared memory multiprocessors, this paper proposes a mechanism, named specMEM, to execute memory accesses following ...
Takayuki Sato, Kazuhiko Ohno, Hiroshi Nakashima
ICC
2000
IEEE
124views Communications» more  ICC 2000»
13 years 9 months ago
A Fast IP Routing Lookup Scheme
Abstract—A major issue in router design for the next generation Internet is the fast IP address lookup mechanism. The existing scheme by Huang et al. performs the IP address look...
Pi-Chung Wang, Yaw-Chung Chen, Chia-Tai Chan