Sciweavers

PVLDB
2008

Efficient implementation of sorting on multi-core SIMD CPU architecture

13 years 4 months ago
Efficient implementation of sorting on multi-core SIMD CPU architecture
Sorting a list of input numbers is one of the most fundamental problems in the field of computer science in general and high-throughput database applications in particular. Although literature abounds with various flavors of sorting algorithms, different architectures call for customized implementations to achieve faster sorting times. This paper presents an efficient implementation and detailed analysis of MergeSort on current CPU architectures. Our SIMD implementation with 128-bit SSE is 3.3X faster than the scalar version. In addition, our algorithm performs an efficient multiway merge, and is not constrained by the memory bandwidth. Our multi-threaded, SIMD implementation sorts 64 million floating point numbers in less than 0.5 seconds on a commodity 4-core Intel processor. This measured performance compares favorably with all previously published results. Additionally, the paper demonstrates performance scalability of the proposed sorting algorithm with respect to certain salient...
Jatin Chhugani, Anthony D. Nguyen, Victor W. Lee,
Added 28 Dec 2010
Updated 28 Dec 2010
Type Journal
Year 2008
Where PVLDB
Authors Jatin Chhugani, Anthony D. Nguyen, Victor W. Lee, William Macy, Mostafa Hagog, Yen-Kuang Chen, Akram Baransi, Sanjeev Kumar, Pradeep Dubey
Comments (0)