The computational power provided by many-core graphics processing units (GPUs) has been exploited in many applications. The programming techniques currently employed on these GPUs...
Long Chen, Oreste Villa, Sriram Krishnamoorthy, Gu...
To fully exploit multicore processors, applications are expected to provide a large degree of thread-level parallelism. While adequate for low core counts and their typical worklo...
We describe a load-balancing mechanism for assigning elements to servers in a distributed data structure that supports range queries. The mechanism ensures both load-balancing wit...
James Aspnes, Jonathan Kirsch, Arvind Krishnamurth...
We consider storage in an extremely large-scale distributed computer system designed for stream processing applications. In such systems, incoming data and intermediate results ma...
Kirsten Hildrum, Fred Douglis, Joel L. Wolf, Phili...
Abstract. Domain decomposition for regular meshes on parallel computers has traditionally been performed by attempting to exactly partition the work among the available processors ...