Tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality. Effective use of tiling requires selection and tuning of the tile sizes. This is...
Software or hardware data cache prefetching is an efficient way to hide cache miss latency. However effectiveness of the issued prefetches have to be monitored in order to maximi...
Process arrival pattern, which denotes the timing when different processes arrive at an MPI collective operation, can have a significant impact on the performance of the operatio...
The development of high-level programming environments is essential if wireless sensor networks are to be accessible to nonexperts. In this paper, we present the Regiment system, ...
Previous studies have shown that array regrouping and structure splitting significantly improve data locality. The most effective technique relies on profiling every access to eve...