Sciweavers

2852 search results - page 348 / 571
» High Performance Architectures and Compilers
Sort
View
ICFP
2012
ACM
13 years 7 months ago
Nested data-parallelism on the gpu
Graphics processing units (GPUs) provide both memory bandwidth and arithmetic performance far greater than that available on CPUs but, because of their Single-Instruction-Multiple...
Lars Bergstrom, John H. Reppy
155
Voted
DATE
2010
IEEE
144views Hardware» more  DATE 2010»
15 years 9 months ago
A reconfigurable hardware for one bit transform based multiple reference frame Motion Estimation
—Motion Estimation (ME) is the most computationally intensive part of video compression and video enhancement systems. One bit transform (1BT) based ME algorithms have low comput...
Abdulkadir Akin, G. Sayilar, Ilker Hamzaoglu
140
Voted
SIGCOMM
2009
ACM
15 years 11 months ago
Optimizing the BSD routing system for parallel processing
The routing architecture of the original 4.4BSD [3] kernel has been deployed successfully without major design modification for over 15 years. In the unified routing architectur...
Qing Li, Kip Macy
ARITH
2009
IEEE
15 years 11 months ago
Challenges in Automatic Optimization of Arithmetic Circuits
Despite the impressive progress of logic synthesis in the past decade, finding the best architecture for a given circuit still remains an open and largely unsolved problem, espec...
Ajay K. Verma, Philip Brisk, Paolo Ienne
INFOCOM
2009
IEEE
15 years 11 months ago
The Crosspoint-Queued Switch
Abstract—This paper calls for rethinking packet-switch architectures by cutting all dependencies between the switch fabric and the linecards. Most single-stage packet-switch arch...
Josef Kanizo, David Hay, Isaac Keslassy