Sciweavers

ISPASS
2010
IEEE
13 years 2 months ago
Weak execution ordering - exploiting iterative methods on many-core GPUs
Abstract--On NVIDIA's many-core GPUs, there is no synchronization function among parallel thread blocks. When finegranularity of data communication and synchronization is requ...
Jianmin Chen, Zhuo Huang, Feiqi Su, Jih-Kwon Peir,...
ISPASS
2010
IEEE
13 years 6 months ago
Understanding transactional memory performance
Abstract—Transactional memory promises to generalize transactional programming to mainstream languages and data structures. The purported benefit of transactions is that they ar...
Donald E. Porter, Emmett Witchel
ISPASS
2010
IEEE
13 years 8 months ago
PEBIL: Efficient static binary instrumentation for Linux
Binary instrumentation facilitates the insertion of additional code into an executable in order to observe or modify the executable's behavior. There are two main approaches t...
Michael Laurenzano, Mustafa M. Tikir, Laura Carrin...
ISPASS
2010
IEEE
13 years 8 months ago
StatStack: Efficient modeling of LRU caches
The identification of the memory gap in terms of the relatively slow memory accesses put a focus on cache
David Eklov, Erik Hagersten
ISPASS
2010
IEEE
13 years 9 months ago
A study of hardware assisted IP over InfiniBand and its impact on enterprise data center performance
— High-performance sockets implementations such as the Sockets Direct Protocol (SDP) have traditionally showed major performance advantages compared to the TCP/IP stack over Infi...
Ryan E. Grant, Pavan Balaji, Ahmad Afsahi
ISPASS
2010
IEEE
13 years 11 months ago
Runahead execution vs. conventional data prefetching in the IBM POWER6 microprocessor
After many years of prefetching research, most commercially available systems support only two types of prefetching: software-directed prefetching and hardware-based prefetchers u...
Harold W. Cain, Priya Nagpurkar
ISPASS
2010
IEEE
13 years 11 months ago
Performance-effective operation below Vcc-min
Continuous circuit miniaturization and increased process variability point to a future with diminishing returns from dynamic voltage scaling. Operation below Vcc-min has been prop...
Nikolas Ladas, Yiannakis Sazeides, Veerle Desmet
ISPASS
2010
IEEE
13 years 11 months ago
The Hadoop distributed filesystem: Balancing portability and performance
—Hadoop is a popular open-source implementation of MapReduce for the analysis of large datasets. To manage storage resources across the cluster, Hadoop uses a distributed user-le...
Jeffrey Shafer, Scott Rixner, Alan L. Cox
ISPASS
2010
IEEE
13 years 11 months ago
Hardware prediction of OS run-length for fine-grained resource customization
—In the past ten years, computer architecture has seen a paradigm shift from emphasizing single thread performance to energy efficient, throughput oriented, chip multiprocessors...
David Nellans, Kshitij Sudan, Rajeev Balasubramoni...
ISPASS
2010
IEEE
13 years 11 months ago
Demystifying GPU microarchitecture through microbenchmarking
—Graphics processors (GPU) offer the promise of more than an order of magnitude speedup over conventional processors for certain non-graphics computations. Because the ften prese...
Henry Wong, Misel-Myrto Papadopoulou, Maryam Sadoo...