Sciweavers

ICS
2005
Tsinghua U.
13 years 9 months ago
System noise, OS clock ticks, and fine-grained parallel applications
As parallel jobs get bigger in size and finer in granularity, “system noise” is increasingly becoming a problem. In fact, fine-grained jobs on clusters with thousands of SMP...
Dan Tsafrir, Yoav Etsion, Dror G. Feitelson, Scott...
ICS
2005
Tsinghua U.
13 years 9 months ago
Disk layout optimization for reducing energy consumption
Excessive power consumption is becoming a major barrier to extracting the maximum performance from high-performance parallel systems. Therefore, techniques oriented towards reduci...
Seung Woo Son, Guangyu Chen, Mahmut T. Kandemir
ICS
2005
Tsinghua U.
13 years 9 months ago
Lightweight reference affinity analysis
Previous studies have shown that array regrouping and structure splitting significantly improve data locality. The most effective technique relies on profiling every access to eve...
Xipeng Shen, Yaoqing Gao, Chen Ding, Roch Archamba...
ICS
2005
Tsinghua U.
13 years 9 months ago
Parallel sparse LU factorization on second-class message passing platforms
Several message passing-based parallel solvers have been developed for general (non-symmetric) sparse LU factorization with partial pivoting. Due to the fine-grain synchronizatio...
Kai Shen
ICS
2005
Tsinghua U.
13 years 9 months ago
Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation
Chip Multiprocessors (CMPs) are flexible, high-frequency platforms on which to support Thread-Level Speculation (TLS). However, for TLS to deliver on its promise, CMPs must explo...
Jose Renau, James Tuck, Wei Liu, Luis Ceze, Karin ...
ICS
2005
Tsinghua U.
13 years 9 months ago
A heterogeneously segmented cache architecture for a packet forwarding engine
As network traffic continues to increase and with the requirement to process packets at line rates, high performance routers need to forward millions of packets every second. Eve...
Kaushik Rajan, Ramaswamy Govindarajan
ICS
2005
Tsinghua U.
13 years 9 months ago
The implications of working set analysis on supercomputing memory hierarchy design
Supercomputer architects strive to maximize the performance of scientific applications. Unfortunately, the large, unwieldy nature of most scientific applications has lead to the...
Richard C. Murphy, Arun Rodrigues, Peter M. Kogge,...
ICS
2005
Tsinghua U.
13 years 9 months ago
Power-aware resource allocation in high-end systems via online simulation
Traditionally, scheduling in high-end parallel systems focuses on how to minimize the average job waiting time and on how to maximize the overall system utilization. Despite the d...
Barry Lawson, Evgenia Smirni
ICS
2005
Tsinghua U.
13 years 9 months ago
A NUCA substrate for flexible CMP cache sharing
We propose an organization for the on-chip memory system of a chip multiprocessor, in which 16 processors share a 16MB pool of 256 L2 cache banks. The L2 cache is organized as a n...
Jaehyuk Huh, Changkyu Kim, Hazim Shafi, Lixin Zhan...