A Context-free Markup Language for Semi-structured Text

13 years 1 months ago
A Context-free Markup Language for Semi-structured Text
An ad hoc data format is any non-standard, semi-structured data format for which robust data processing tools are not available. In this paper, we present ANNE, a new kind of mark-up language designed to help users generate documentation and data processing tools for ad hoc text data. More specifically, given a new ad hoc data source, an ANNE programmer will edit the document to add a number of simple annotations, which serve to specify its syntactic structure. Annotations include elements that specify constants, optional data, alternatives, enumerations, sequences, tabular data, and recursive patterns. The ANNE system uses a combination of user annotations and the raw data itself to extract a context-free grammar from the document. This context-free grammar can then be used to parse the data and transform it into an XML parse tree, which may be viewed through a browser for analysis or debugging purposes. In addition, the ANNE system will generate a PADS/ML description [21], which may...
Qian Xi, David Walker
Added 01 Mar 2010
Updated 02 Mar 2010
Type Conference
Year 2010
Where PLDI
Authors Qian Xi, David Walker
Comments (0)