Sciweavers

581 search results - page 107 / 117
» Implementing the Best Processor Cores
Sort
View
PPOPP
2010
ACM
15 years 9 months ago
Scaling LAPACK panel operations using parallel cache assignment
In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high perf...
Anthony M. Castaldo, R. Clint Whaley
ASPLOS
2010
ACM
15 years 6 months ago
ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications
Instruction-grain lifeguards monitor the events of a running application at the level of individual instructions in order to identify and help mitigate application bugs and securi...
Evangelos Vlachos, Michelle L. Goodstein, Michael ...
VEE
2009
ACM
146views Virtualization» more  VEE 2009»
15 years 6 months ago
Demystifying magic: high-level low-level programming
r of high-level languages lies in their abstraction over hardware and software complexity, leading to greater security, better reliability, and lower development costs. However, o...
Daniel Frampton, Stephen M. Blackburn, Perry Cheng...
SIGCOMM
2009
ACM
15 years 6 months ago
Optimizing the BSD routing system for parallel processing
The routing architecture of the original 4.4BSD [3] kernel has been deployed successfully without major design modification for over 15 years. In the unified routing architectur...
Qing Li, Kip Macy
ISPASS
2007
IEEE
15 years 6 months ago
CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications
This paper proposes a specialized memory structure called CA-RAM (Content Addressable Random Access Memory) to accelerate search operations present in many important real-world ap...
Sangyeun Cho, Joel R. Martin, Ruibin Xu, Mohammad ...