Sciweavers

814 search results - page 25 / 163
» Improving the execution time of global communication operati...
Sort
View
117
Voted
IPPS
2010
IEEE
14 years 10 months ago
Inter-block GPU communication via fast barrier synchronization
The graphics processing unit (GPU) has evolved from a fixedfunction processor with programmable stages to a programmable processor with many fixed-function components that deliver...
Shucai Xiao, Wu-chun Feng
VLSID
2003
IEEE
123views VLSI» more  VLSID 2003»
16 years 26 days ago
Synthesis of Real-Time Embedded Software by Timed Quasi-Static Scheduling
A formal synthesis method for complex real-time embedded software is proposed in this work. Compared to previous work, our method not only synthesizes embedded software with compl...
Pao-Ann Hsiung, Feng-Shi Su
PLDI
2012
ACM
13 years 2 months ago
Effective parallelization of loops in the presence of I/O operations
Software-based thread-level parallelization has been widely studied for exploiting data parallelism in purely computational loops to improve program performance on multiprocessors...
Min Feng, Rajiv Gupta, Iulian Neamtiu
97
Voted
ICS
1998
Tsinghua U.
15 years 4 months ago
Load Execution Latency Reduction
In order to achieve high performance, contemporary microprocessors must effectively process the four major instruction types: ALU, branch, load, and store instructions. This paper...
Bryan Black, Brian Mueller, Stephanie Postal, Ryan...
102
Voted
GI
2004
Springer
15 years 5 months ago
Reliability study of an embedded operating system for industrial applications
: Critical industrial applications or fault tolerant applications need for operating systems (OS) which guarantee a correct and safe behaviour despite the appearance of errors. In ...
Juan Pardo, José Carlos Campelo, Juan Jos&e...