Sciweavers

ICDE
2012
IEEE

Scalable and Numerically Stable Descriptive Statistics in SystemML

11 years 6 months ago
Scalable and Numerically Stable Descriptive Statistics in SystemML
—With the exponential growth in the amount of data that is being generated in recent years, there is a pressing need for applying machine learning algorithms to large data sets. SystemML is a framework that employs a declarative approach for large scale data analytics. In SystemML, machine learning algorithms are expressed as scripts in a high-level language, called DML, which is syntactically similar to R. DML scripts are compiled, optimized, and executed in the SystemML runtime that is built on top of MapReduce. As the basis of virtually every quantitative analysis, descriptive statistics provide powerful tools to explore data in SystemML. In this paper, we describe our experience in implementing descriptive statistics in SystemML. In particular, we elaborate on how to overcome the two major challenges: (1) achieving numerical stability while operating on large data sets in a distributed setting of MapReduce; and (2) designing scalable algorithms to compute order statistics in MapR...
Yuanyuan Tian, Shirish Tatikonda, Berthold Reinwal
Added 28 Sep 2012
Updated 28 Sep 2012
Type Journal
Year 2012
Where ICDE
Authors Yuanyuan Tian, Shirish Tatikonda, Berthold Reinwald
Comments (0)