Sciweavers

782 search results - page 121 / 157
» Dag-Consistent Distributed Shared Memory
Sort
View
IPPS
2007
IEEE
15 years 10 months ago
Optimizing Inter-Nest Data Locality Using Loop Splitting and Reordering
With the increasing gap between processor speed and memory latency, the performance of data-dominated programs are becoming more reliant on fast data access, which can be improved...
Sofiane Naci
HIPC
2007
Springer
15 years 10 months ago
Optimization of Collective Communication in Intra-cell MPI
: The Cell is a heterogeneous multi-core processor, which has eight co-processors, called SPEs. The SPEs can access a common shared main memory through DMA, and each SPE can direct...
M. K. Velamati, Arun Kumar, Naresh Jayam, Ganapath...
ICPPW
2006
IEEE
15 years 10 months ago
Retargeting Image-Processing Algorithms to Varying Processor Grain Sizes
Embedded computing architectures can be designed to meet a variety of application specific requirements. However, optimized hardware can require compiler support to realize the po...
Sam Sander, Linda M. Wills
IPPS
2006
IEEE
15 years 10 months ago
Parallel implementation and performance characterization of MUSCLE
Multiple sequence alignment is a fundamental and very computationally intensive task in molecular biology. MUSCLE, a new algorithm for creating multiple alignments of protein sequ...
Xi Deng, Eric Li, Jiulong Shan, Wenguang Chen
SPAA
2006
ACM
15 years 10 months ago
The cache complexity of multithreaded cache oblivious algorithms
We present a technique for analyzing the number of cache misses incurred by multithreaded cache oblivious algorithms on an idealized parallel machine in which each processor has a...
Matteo Frigo, Volker Strumpen