Sciweavers

IEEEPACT
2005
IEEE
13 years 10 months ago
Characterization of TCC on Chip-Multiprocessors
Transactional Coherence and Consistency (TCC) is a novel coherence scheme for shared memory multiprocessors that uses programmer-defined transactions as the fundamental unit of p...
Austen McDonald, JaeWoong Chung, Hassan Chafi, Chi...
IEEEPACT
2005
IEEE
13 years 10 months ago
Memory Coloring: A Compiler Approach for Scratchpad Memory Management
Scratchpad memory (SPM), a fast software-managed onchip SRAM, is now widely used in modern embedded processors. Compared to hardware-managed cache, it is more efficient in perfor...
Lian Li 0002, Lin Gao 0002, Jingling Xue
IEEEPACT
2005
IEEE
13 years 10 months ago
HUNTing the Overlap
Hiding communication latency is an important optimization for parallel programs. Programmers or compilers achieve this by using non-blocking communication primitives and overlappi...
Costin Iancu, Parry Husbands, Paul Hargrove
IEEEPACT
2005
IEEE
13 years 10 months ago
Communication Optimizations for Fine-Grained UPC Applications
Global address space languages like UPC exhibit high performance and portability on a broad class of shared and distributed memory parallel architectures. The most scalable applic...
Wei-Yu Chen, Costin Iancu, Katherine A. Yelick
IEEEPACT
2005
IEEE
13 years 10 months ago
Maximizing CMP Throughput with Mediocre Cores
In this paper we compare the performance of area equivalent small, medium, and large-scale multithreaded chip multiprocessors (CMTs) using throughput-oriented applications. We use...
John D. Davis, James Laudon, Kunle Olukotun
IEEEPACT
2005
IEEE
13 years 10 months ago
Performance Analysis of System Overheads in TCP/IP Workloads
Current high-performance computer systems are unable to saturate the latest available high-bandwidth networks such as 10 Gigabit Ethernet. A key obstacle in achieving 10 gigabits ...
Nathan L. Binkert, Lisa R. Hsu, Ali G. Saidi, Rona...
IEEEPACT
2005
IEEE
13 years 10 months ago
A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction
The continual demand for greater performance and growing concerns about the power consumption in highperformance microprocessors make the branch predictor a critical component of ...
Gabriel H. Loh
IEEEPACT
2005
IEEE
13 years 10 months ago
Trace Cache Sampling Filter
This paper presents a new technique for efficient usage of small trace caches. A trace cache can significantly increase the performance of wide out-oforder processors, but to be e...
Michael Behar, Avi Mendelson, Avinoam Kolodny
IEEEPACT
2005
IEEE
13 years 10 months ago
Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors
This paper proposes a new hardware technique for using one core of a CMP to prefetch data for a thread running on another core. Our approach simply executes a copy of all non-cont...
Ilya Ganusov, Martin Burtscher