Sciweavers

420 search results - page 71 / 84
» Scalable Parallel Programming with CUDA
Sort
View
CCGRID
2011
IEEE
14 years 1 months ago
Network-Friendly One-Sided Communication through Multinode Cooperation on Petascale Cray XT5 Systems
—One-sided communication is important to enable asynchronous communication and data movement for Global Address Space (GAS) programming models. Such communication is typically re...
Xinyu Que, Weikuan Yu, Vinod Tipparaju, Jeffrey S....
PPOPP
2009
ACM
15 years 10 months ago
A compiler-directed data prefetching scheme for chip multiprocessors
Data prefetching has been widely used in the past as a technique for hiding memory access latencies. However, data prefetching in multi-threaded applications running on chip multi...
Dhruva Chakrabarti, Mahmut T. Kandemir, Mustafa Ka...
VLDB
2007
ACM
145views Database» more  VLDB 2007»
15 years 9 months ago
Executing Stream Joins on the Cell Processor
Low-latency and high-throughput processing are key requirements of data stream management systems (DSMSs). Hence, multi-core processors that provide high aggregate processing capa...
Bugra Gedik, Philip S. Yu, Rajesh Bordawekar
IPPS
2008
IEEE
15 years 3 months ago
Overcoming scaling challenges in biomolecular simulations across multiple platforms
NAMD† is a portable parallel application for biomolecular simulations. NAMD pioneered the use of hybrid spatial and force decomposition, a technique now used by most scalable pr...
Abhinav Bhatele, Sameer Kumar, Chao Mei, James C. ...
HPCC
2009
Springer
15 years 1 months ago
Dynamically Filtering Thread-Local Variables in Lazy-Lazy Hardware Transactional Memory
Abstract--Transactional Memory (TM) is an emerging technology which promises to make parallel programming easier. However, to be efficient, underlying TM system should protect only...
Sutirtha Sanyal, Sourav Roy, Adrián Cristal...