Sciweavers

ASPLOS
2012
ACM
11 years 11 months ago
Scalable address spaces using RCU balanced trees
Software developers commonly exploit multicore processors by building multithreaded software in which all threads of an application share a single address space. This shared addre...
Austin T. Clements, M. Frans Kaashoek, Nickolai Ze...
CC
2002
Springer
131views System Software» more  CC 2002»
13 years 3 months ago
Global Variable Promotion: Using Registers to Reduce Cache Power Dissipation
Global variable promotion, i.e. allocating unaliased globals to registers, can significantly reduce the number of memory operations. This results in reduced cache activity and less...
Andrea G. M. Cilio, Henk Corporaal
CGF
2008
125views more  CGF 2008»
13 years 3 months ago
Interactive Visualization for Memory Reference Traces
We present the Memory Trace Visualizer (MTV), a tool that provides interactive visualization and analysis of the sequence of memory operations performed by a program as it runs. A...
A. N. M. Imroz Choudhury, Kristin C. Potter, Steve...
ISCA
1996
IEEE
130views Hardware» more  ISCA 1996»
13 years 7 months ago
Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors
Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to addres...
Mark Horowitz, Margaret Martonosi, Todd C. Mowry, ...
ISCA
1997
IEEE
98views Hardware» more  ISCA 1997»
13 years 7 months ago
Prefetching Using Markov Predictors
Prefetching is one approach to reducing the latency of memory operations in modern computer systems. In this paper, we describe the Markov prefetcher. This prefetcher acts as an i...
Doug Joseph, Dirk Grunwald
ASPLOS
1998
ACM
13 years 7 months ago
Compiler-Controlled Memory
Optimizations aimed at reducing the impact of memory operations on execution speed have long concentrated on improving cache performance. These efforts achieve a reasonable level...
Keith D. Cooper, Timothy J. Harvey
ISCA
1999
IEEE
104views Hardware» more  ISCA 1999»
13 years 8 months ago
Is SC + ILP=RC?
Sequential consistency (SC) is the simplest programming interface for shared-memory systems but imposes program order among all memory operations, possibly precluding high perform...
Chris Gniady, Babak Falsafi, T. N. Vijaykumar
ICS
2009
Tsinghua U.
13 years 8 months ago
Single-particle 3d reconstruction from cryo-electron microscopy images on GPU
Single-particle 3D reconstruction from cryo-electron microscopy (cryo-EM) images is a kernel application of biological molecules analysis, as the computational requirement of whic...
Guangming Tan, Ziyu Guo, Mingyu Chen, Dan Meng
SC
2004
ACM
13 years 9 months ago
Analysis and Performance Results of a Molecular Modeling Application on Merrimac
The Merrimac supercomputer uses stream processors and a highradix network to achieve high performance at low cost and low power. The stream architecture matches the capabilities o...
Mattan Erez, Jung Ho Ahn, Ankit Garg, William J. D...
IEEEPACT
2006
IEEE
13 years 9 months ago
Compiling for stream processing
This paper describes a compiler for stream programs that efficiently schedules computational kernels and stream memory operations, and allocates on-chip storage. Our compiler uses...
Abhishek Das, William J. Dally, Peter R. Mattson