Sciweavers

FAST
2009

Provenance as Data Mining: Combining File System Metadata with Content Analysis

13 years 2 months ago
Provenance as Data Mining: Combining File System Metadata with Content Analysis
Provenance describes how an object came to be in its present state. Thus, it describes the evolution of the object over time. Prior work on provenance has focussed on databases and the file system. The database or file system is enhanced or augmented in order to capture additional information about the historical evolution of document collections, and thus answer the provenance question. We address the question of provenance for unstructured information (i.e., document corpii from file systems) but without any enhancements to the file system. To provide a solution in this setting, we model the provenance problem in such a setting as a problem of data mining. We show that data mining can provide provenance information for repositories of unstructured information, including chains of historical evolution. Thus, we do not require any additions to the file system, and we can operate on legacy documents. Experimental results indicate a strong performance of our approach. 1 The Provenance p...
Vinay Deolalikar, Hernan Laffitte
Added 17 Feb 2011
Updated 17 Feb 2011
Type Journal
Year 2009
Where FAST
Authors Vinay Deolalikar, Hernan Laffitte
Comments (0)