Most image processing algorithms can be parallelized by splitting parallel loops and by using very few communication patterns. Code parallelization using MPI still involves much p...
Due to the strong increase of processing units available to the end user, expressing parallelism of an algorithm is a major challenge for many researchers. Parallel applications ar...
Using traditional software profiling to optimize embedded software in an MPSoC design is not reliable. With multiple processors running concurrently and programs interacting, trad...
Embedded systems consisting of the application program ROM, RAM, the embedded processor core and any custom hardware on a single wafer are becoming increasingly common in areas suc...
To execute a shared memory program efficiently, we have to manage memory consistency with low overheads, and have to utilize communication bandwidth of the platform as much as pos...