Splash: ad-hoc querying of data and statistical models

11 years 3 months ago
Splash: ad-hoc querying of data and statistical models
Data mining is increasingly performed by people who are not computer scientists or professional programmers. It is often done as an iterative process involving multiple ad-hoc tasks, as well as data pre- and post-processing, all of which must be executed over large databases. In order to make data mining more accessible, it is critical to provide a simple, easy-to-use language that allows the user to specify adhoc data processing, model construction, and model manipulation. Simultaneously, it is necessary for the underlying system to scale up to large datasets. Unfortunately, while each of these requirements can be satisfied, individually, by existing systems, no system fully satisfies all criteria. In this paper, we present a system called Splash to fill this void. Splash supports an extended relational data model and SQL query language, which allows for the natural integration of statistical modeling and ad-hoc data processing. It also supports a novel representatives operator to he...
Lujun Fang, Kristen LeFevre
Added 03 Sep 2010
Updated 03 Sep 2010
Type Conference
Year 2010
Where EDBT
Authors Lujun Fang, Kristen LeFevre
Comments (0)