In a previous paper we show how the FLAME methods and tools provide a solution to compute dense dense linear algebra operations on a multi-GPU platform with reasonable performance...
We present an active switch architecture to improve the performance of systems connected via system area networks. Our programmable active switches not only flexibly route packets...
Joins or chords is a concurrency construct that seems to fit well with the object oriented paradigm. Chorded languages are presented with implicit assumptions regarding the fair t...
Abstract—Dynamic runtimes can simplify parallel programming by automatically managing concurrency and locality without further burdening the programmer. Nevertheless, implementin...
Richard M. Yoo, Anthony Romano, Christos Kozyrakis
Although processors become massively multicore and therefore new programming models mix message passing and multi-threading, the effects of threads on communication libraries rema...