Sciweavers

POPL
2008
ACM

From dirt to shovels: fully automatic tool generation from ad hoc data

14 years 4 months ago
From dirt to shovels: fully automatic tool generation from ad hoc data
An ad hoc data source is any semistructured data source for which useful data analysis and transformation tools are not readily available. Such data must be queried, transformed and displayed by systems administrators, computational biologists, financial analysts and hosts of others on a regular basis. In this paper, we demonstrate that it is possible to generate a suite of useful data processing tools, including a semi-structured query engine, several format converters, a statistical analyzer and data visualization routines directly from the ad hoc data itself, without any human intervention. The key technical contribution of the work is a multiphase algorithm that automatically infers the structure of an ad hoc data source and produces a format specification in the PADS data description language. Programmers wishing to implement custom data analysis tools can use such descriptions to generate printing and parsing libraries for the data. Alternatively, our software infrastructure wil...
Kathleen Fisher, David Walker, Kenny Qili Zhu, Pet
Added 03 Dec 2009
Updated 03 Dec 2009
Type Conference
Year 2008
Where POPL
Authors Kathleen Fisher, David Walker, Kenny Qili Zhu, Peter White
Comments (0)