Integrating Data and Probabilistically Structured Text Documents

9 years 7 months ago
Integrating Data and Probabilistically Structured Text Documents
Commercial, non-profit and public organizations are accumulating huge amounts of electronically available text documents. Although composed of unstructured texts, documents contained in archives such as annual reports to shareholders, medical patient records and public announcements often share an inherent, though undocumented structure. In order to enable information integration of text collections with related structured data sources, this inherent structure should be made explicit as detailed as possible. The goal of this study is the establishment of a methodology for the integration of text documents with structured records into a hyper-archive of application-specific entities. The text documents are of implicit structure which has been explicated by data mining techniques as proposed in the DIAsDEM framework for semantic tagging of domain-specific text documents. The result is a probabilistic DTD that serves as a basis for the matching of schemata and for the matching of data in...
Karsten Winkler, Myra Spiliopoulou
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2001
Where FDBS
Authors Karsten Winkler, Myra Spiliopoulou
Comments (0)