We present a novel design and implementation of relational join algorithms for new-generation graphics processing units (GPUs). The most recent GPU features include support for wr...
Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga K. ...
—Remote atomic memory operations are critical for achieving high-performance synchronization in tightly-coupled systems. Previous approaches to implementing atomic memory operati...
Keith D. Underwood, Michael Levenhagen, K. Scott H...
There is a strong need now for compilers of embedded systems to find effective ways of optimizing series of loop-nests, wherein majority of the memory references occur in the fo...
Javed Absar, Min Li, Praveen Raghavan, Andy Lambre...
This paper presents a high-performance Distributed Shared Memory system called VODCA, which supports a novel View-Oriented Parallel Programming on cluster computers. One advantage...
Zhiyi Huang, Wenguang Chen, Martin K. Purvis, Weim...
XML transformations are most naturally defined as recursive functions on trees. Their direct implementation, however, causes inefficient memory usage because the input XML tree is...