Sciweavers

165 search results - page 32 / 33
» Thread Cluster Memory Scheduling
Sort
View
136
Voted
PPOPP
2011
ACM
14 years 2 months ago
GRace: a low-overhead mechanism for detecting data races in GPU programs
In recent years, GPUs have emerged as an extremely cost-effective means for achieving high performance. Many application developers, including those with no prior parallel program...
Mai Zheng, Vignesh T. Ravi, Feng Qin, Gagan Agrawa...
CGO
2008
IEEE
15 years 6 months ago
Latency-tolerant software pipelining in a production compiler
In this paper we investigate the benefit of scheduling non-critical loads for a higher latency during software pipelining. "Noncritical" denotes those loads that have s...
Sebastian Winkel, Rakesh Krishnaiyer, Robyn Sampso...
KDD
2009
ACM
198views Data Mining» more  KDD 2009»
16 years 4 days ago
Pervasive parallelism in data mining: dataflow solution to co-clustering large and sparse Netflix data
All Netflix Prize algorithms proposed so far are prohibitively costly for large-scale production systems. In this paper, we describe an efficient dataflow implementation of a coll...
Srivatsava Daruru, Nena M. Marin, Matt Walker, Joy...
ISCA
2008
IEEE
136views Hardware» more  ISCA 2008»
14 years 11 months ago
The Design and Performance of a Bare PC Web Server
There is an increasing need for new Web server architectures that are application-centric, simple, small, and pervasive in nature. In this paper, we present a novel architecture f...
Long He, Ramesh K. Karne, Alexander L. Wijesinha
PDCAT
2009
Springer
15 years 6 months ago
CheCUDA: A Checkpoint/Restart Tool for CUDA Applications
Abstract—In this paper, a tool named CheCUDA is designed to checkpoint CUDA applications that use GPUs as accelerators. As existing checkpoint/restart implementations do not supp...
Hiroyuki Takizawa, Katsuto Sato, Kazuhiko Komatsu,...