In this paper, we study the instruction cache miss behavior of four modern commercial applications (a database workload, TPC-W, SPECjAppServer2002 and SPECweb99). These applicatio...
The Sort operation is a core part of many critical applications. Despite the large efforts to parallelize it, the fact that it suffers from high data-dependencies vastly limits it...
Layali K. Rashid, Wessam Hassanein, Moustafa A. Ha...
Driven by continuing scaling of Moore's law, chip multiprocessors and systems-on-a-chip are expected to grow the core count from dozens today to hundreds in the near future. ...
Boris Grot, Joel Hestness, Stephen W. Keckler, Onu...
Intuitively, aggressive work sharing among concurrent queries in a database system should always improve performance by eliminating redundant computation or data accesses. We show...
Ryan Johnson, Nikos Hardavellas, Ippokratis Pandis...
The efficient solution of large systems of linear equations represented by sparse matrices appears in many tasks. LU factorization followed by backward and forward substitutions i...