A Database-Hadoop Hybrid Approach to Scalable Machine Learning

7 years 1 months ago
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
—There are two popular schools of thought for performing large-scale machine learning that does not fit into memory. One is to run machine learning within a relational database management system, and the other is to push analytical functions into MapReduce. As each approach has its own set of pros and cons, we propose a database-Hadoop hybrid approach to scalable machine learning where batch-learning is performed on the Hadoop platform, while incrementallearning is performed on PostgreSQL. We propose a purely relational approach that removes the scalability limitation of previous approaches based on user-defined aggregates and also discuss issues and resolutions in applying the proposed approach to Hadoop/Hive. Experimental evaluations of classification performance and training speed were conducted using a commercial advertisement dataset provided in the KDD Cup 2012, Track 2. The experimental results show that our scheme has competitive classification performance and superior tr...
Makoto Yui, Isao Kojima
Added 27 Apr 2014
Updated 27 Apr 2014
Type Journal
Year 2013
Authors Makoto Yui, Isao Kojima
Comments (0)