Abstract. Parallel processing is based on utilizing a group of processors to efficiently solve large problems faster than is possible on a single processor. To accomplish this, the...
Parallel programs that modify shared data in a cachecoherent multiprocessor with a write-invalidate coherence protocol create ownership overhead in the form of ownership acquisiti...
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...
Hierarchical agglomerative clustering (HAC) is a common clustering method that outputs a dendrogram showing all N levels of agglomerations where N is the number of objects in the d...
Manoranjan Dash, Simona Petrutiu, Peter Scheuerman...
Dependencies between iterations of loop structures cannot always be determined at compile-time because they may depend on input data which is known only at run-time. A prime examp...
V. Prasad Krothapalli, Thulasiraman Jeyaraman, Mar...