Sciweavers

PADL
2009
Springer

Ad Hoc Data and the Token Ambiguity Problem

14 years 5 months ago
Ad Hoc Data and the Token Ambiguity Problem
Abstract. PADS is a declarative language used to describe the syntax and semantic properties of ad hoc data sources such as financial transactions, server logs and scientific data sets. The PADS compiler reads these descriptions and generates a suite of useful data processing tools such as format translators, parsers, printers and even a query engine, all customized to the ad hoc data format in question. Recently, however, to further improve the productivity of programmers that manage ad hoc data sources, we have turned to using PADS as an intermediate language in a system that first infers a PADS description directly from example data and then passes that description to the original compiler for tool generation. A key subproblem in the inference engine is the token ambiguity problem -- the problem of determining which substrings in the example data correspond to complex tokens such as dates, URLs, or comments. In order to solve the token ambiguity problem, the paper studies the relati...
Qian Xi, Kathleen Fisher, David Walker, Kenny Qili
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2009
Where PADL
Authors Qian Xi, Kathleen Fisher, David Walker, Kenny Qili Zhu
Comments (0)