Sciweavers

ICS
2007
Tsinghua U.
13 years 11 months ago
A study of process arrival patterns for MPI collective operations
Process arrival pattern, which denotes the timing when different processes arrive at an MPI collective operation, can have a significant impact on the performance of the operatio...
Ahmad Faraj, Pitch Patarasuk, Xin Yuan
ICS
2007
Tsinghua U.
13 years 11 months ago
Active memory operations
Zhen Fang, Lixin Zhang, John B. Carter, Ali Ibrahi...
ICS
2007
Tsinghua U.
13 years 11 months ago
Adaptive Strassen's matrix multiplication
Strassen’s matrix multiplication (MM) has benefits with respect to any (highly tuned) implementations of MM because Strassen’s reduces the total number of operations. Strasse...
Paolo D'Alberto, Alexandru Nicolau
ICS
2007
Tsinghua U.
13 years 11 months ago
Scalability analysis of SPMD codes using expectations
Cristian Coarfa, John M. Mellor-Crummey, Nathan Fr...
ICS
2007
Tsinghua U.
13 years 11 months ago
Characteristics of workloads used in high performance and technical computing
Razvan Cheveresan, Matthew Ramsay, Chris Feucht, I...
ICS
2007
Tsinghua U.
13 years 11 months ago
Automatic nonblocking communication for partitioned global address space programs
Overlapping communication with computation is an important optimization on current cluster architectures; its importance is likely to increase as the doubling of processing power ...
Wei-Yu Chen, Dan Bonachea, Costin Iancu, Katherine...
ICS
2007
Tsinghua U.
13 years 11 months ago
Cooperative cache partitioning for chip multiprocessors
This paper presents Cooperative Cache Partitioning (CCP) to allocate cache resources among threads concurrently running on CMPs. Unlike cache partitioning schemes that use a singl...
Jichuan Chang, Gurindar S. Sohi
ICS
2007
Tsinghua U.
13 years 11 months ago
Performance driven data cache prefetching in a dynamic software optimization system
Software or hardware data cache prefetching is an efficient way to hide cache miss latency. However effectiveness of the issued prefetches have to be monitored in order to maximi...
Jean Christophe Beyler, Philippe Clauss
ICS
2007
Tsinghua U.
13 years 11 months ago
GridRod: a dynamic runtime scheduler for grid workflows
Grid Workflows are emerging as practical programming models for solving large e-scientific problems on the Grid. However, it is typically assumed that the workflow components eith...
Shahaan Ayyub, David Abramson
ICS
2007
Tsinghua U.
13 years 11 months ago
Tradeoff between data-, instruction-, and thread-level parallelism in stream processors
This paper explores the scalability of the Stream Processor architecture along the instruction-, data-, and thread-level parallelism dimensions. We develop detailed VLSI-cost and ...
Jung Ho Ahn, Mattan Erez, William J. Dally