Sciweavers

4820 search results - page 248 / 964
» Implementing Signatures for C
Sort
View
142
Voted
ASPLOS
2009
ACM
15 years 10 months ago
Performance analysis of accelerated image registration using GPGPU
This paper presents a performance analysis of an accelerated 2-D rigid image registration implementation that employs the Compute Unified Device Architecture (CUDA) programming e...
Peter Bui, Jay B. Brockman
ICS
2009
Tsinghua U.
15 years 10 months ago
Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems
We describe heterogeneous multi-CPU and multi-GPU implementations of Jacobi’s iterative method for the 2-D Poisson equation on a structured grid, in both single- and doublepreci...
Sundaresan Venkatasubramanian, Richard W. Vuduc
110
Voted
NOCS
2009
IEEE
15 years 10 months ago
Exploring concentration and channel slicing in on-chip network router
Sharing on-chip network resources efficiently is critical in the design of a cost-efficient network on-chip (NoC). Concentration has been proposed for on-chip networks but the t...
Prabhat Kumar, Yan Pan, John Kim, Gokhan Memik, Al...
PPOPP
2006
ACM
15 years 9 months ago
High-performance IPv6 forwarding algorithm for multi-core and multithreaded network processor
IP forwarding is one of the main bottlenecks in Internet backbone routers, as it requires performing the longest-prefix match at 10Gbps speed or higher. IPv6 forwarding further ex...
Xianghui Hu, Xinan Tang, Bei Hua
135
Voted
ICCS
2005
Springer
15 years 9 months ago
Fast Expression Templates
Abstract. Expression templates (ET) can significantly reduce the implementation effort of mathematical software. For some compilers, especially for those of supercomputers, it ca...
Jochen Härdtlein, Alexander Linke, Christoph ...