Sciweavers

14 search results - page 2 / 3
» Tradeoff between data-, instruction-, and thread-level paral...
Sort
View
ISPASS
2009
IEEE
14 years 4 days ago
Analyzing CUDA workloads using a detailed GPU simulator
Modern Graphic Processing Units (GPUs) provide sufficiently flexible programming models that understanding their performance can provide insight in designing tomorrow’s manyco...
Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, He...
MICRO
2003
IEEE
148views Hardware» more  MICRO 2003»
13 years 10 months ago
Fast Secure Processor for Inhibiting Software Piracy and Tampering
Due to the widespread software piracy and virus attacks, significant efforts have been made to improve security for computer systems. For stand-alone computers, a key observation...
Jun Yang 0002, Youtao Zhang, Lan Gao
LCTRTS
2005
Springer
13 years 10 months ago
Cache aware optimization of stream programs
Effective use of the memory hierarchy is critical for achieving high performance on embedded systems. We focus on the class of streaming applications, which is increasingly preval...
Janis Sermulins, William Thies, Rodric M. Rabbah, ...
SPDP
1991
IEEE
13 years 8 months ago
Local vs. global memory in the IBM RP3: experiments and performance modelling
A number of experiments regarding the placement of instructions, private data and shared data in the Non-Uniform-Memory-Access multiprocessor, RP3 has been performed. Three Scient...
Mats Brorsson
ISCA
1994
IEEE
129views Hardware» more  ISCA 1994»
13 years 9 months ago
Impact of Sharing-Based Thread Placement on Multithreaded Architectures
Multithreaded architectures context switch between instruction streams to hide memory access latency. Although this improves processor utilization, it can increase cache interfere...
Radhika Thekkath, Susan J. Eggers