Efficient provenance storage

11 years 10 months ago
Efficient provenance storage
Scientific workflow systems are increasingly used to automate complex data analyses, largely due to their benefits over traditional approaches for workflow design, optimization, and provenance recording. Many workflow systems employ a simple dependency model to represent the provenance of data produced by workflow runs. Although commonly adopted, this model does not capture explicit data dependencies introduced by "provenance-aware" processes, and it can lead to inefficient storage when workflow data is complex or structured. We present a provenance model, extending the conventional approach, that supports (i) explicit data dependencies and (ii) nested data collections. Our model adopts techniques from reference-based XML versioning, adding annotations for process and data dependencies. We present strategies and reduction techniques to store immediate and transitive provenance information within our model, and examine trade-offs among update time, storage size, and query res...
Adriane Chapman, H. V. Jagadish, Prakash Ramanan
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2008
Authors Adriane Chapman, H. V. Jagadish, Prakash Ramanan
Comments (0)