Sciweavers

ICDE
2010
IEEE

Hive - a petabyte scale data warehouse using Hadoop

13 years 11 months ago
Hive - a petabyte scale data warehouse using Hadoop
— The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive. Hadoop [1] is a popular open-source map-reduce implementation which is being used in companies like Yahoo, Facebook etc. to store and process extremely large data sets on commodity hardware. However, the map-reduce programming model is very low level and requires developers to write custom programs which are hard to maintain and reuse. In this paper, we present Hive, an open-source data warehousing solution built on top of Hadoop. Hive supports queries expressed in a SQL-like declarative language - HiveQL, which are compiled into mapreduce jobs that are executed using Hadoop. In addition, HiveQL enables users to plug in custom map-reduce scripts into queries. The language includes a type system with support for tables containing primitive types, collections like arrays and maps, and nested compositions of ...
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zhen
Added 17 May 2010
Updated 17 May 2010
Type Conference
Year 2010
Where ICDE
Authors Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang 0002, Suresh Anthony, Hao Liu, Raghotham Murthy
Comments (0)