A Database-Hadoop Hybrid Approach to Scalable Machine Learning

9 years 12 months ago

Download staff.aist.go.jp

—There are two popular schools of thought for performing large-scale machine learning that does not ﬁt into memory. One is to run machine learning within a relational database management system, and the other is to push analytical functions into MapReduce. As each approach has its own set of pros and cons, we propose a database-Hadoop hybrid approach to scalable machine learning where batch-learning is performed on the Hadoop platform, while incrementallearning is performed on PostgreSQL. We propose a purely relational approach that removes the scalability limitation of previous approaches based on user-deﬁned aggregates and also discuss issues and resolutions in applying the proposed approach to Hadoop/Hive. Experimental evaluations of classiﬁcation performance and training speed were conducted using a commercial advertisement dataset provided in the KDD Cup 2012, Track 2. The experimental results show that our scheme has competitive classiﬁcation performance and superior tr...

Makoto Yui, Isao Kojima

Real-time Traffic