Llama: leveraging columnar storage for scalable join processing in the MapReduce framework

14 years 6 months ago

Download www.comp.nus.edu.sg

To achieve high reliability and scalability, most large-scale data warehouse systems have adopted the cluster-based architecture. In this paper, we propose the design of a new cluster-based data warehouse system, Llama, a hybrid data management system which combines the features of row-wise and column-wise database systems. In Llama, columns are formed into correlation groups to provide the basis for the vertical partitioning of tables. Llama employs a distributed ﬁle system (DFS) to disseminate data among cluster nodes. Above the DFS, a MapReduce-based query engine is supported. We design a new join algorithm to facilitate fast join processing. We present a performance study on TPC-H dataset and compare Llama with Hive, a data warehouse infrastructure built on top of Hadoop. The experiment is conducted on EC2. The results show that Llama has an excellent load performance and its query performance is signiﬁcantly better than the traditional MapReduce framework based on row-wise st...

Yuting Lin, Divyakant Agrawal, Chun Chen, Beng Chi

Real-time Traffic

Data Management System | Data Warehouse Systems | Database | Database Management Systems | SIGMOD 2011 |

claim paper

Added	17 Sep 2011
Updated	17 Sep 2011
Type	Journal
Year	2011
Where	SIGMOD
Authors	Yuting Lin, Divyakant Agrawal, Chun Chen, Beng Chin Ooi, Sai Wu

Sciweavers

Llama: leveraging columnar storage for scalable join processing in the MapReduce framework

Data Management System | Data Warehouse Systems | Database | Database Management Systems | SIGMOD 2011 |

Explore & Download

Productivity Tools

Sciweavers