Phase-decoupled methods for code generation are the state of the art in compilers for standard processors but generally produce code of poor quality for irregular target architect...
Optimizations aimed at reducing the impact of memory operations on execution speed have long concentrated on improving cache performance. These efforts achieve a reasonable level...
We propose a practical parallel on-the-fly algorithm for enumerative LTL model-checking. The algorithm is designed for a cluster of workstations communicating via MPI. The detect...
Quadtree matrices using Morton-order storage provide natural blocking on every level of a memory hierarchy. Writing the natural recursive algorithms to take advantage of this bloc...
The automatic parallelization of C has always been frustrated by pointer arithmetic, irregular control flow and complicated data aggregation. Each of these problems is similar to f...