Sciweavers

SIGMOD
1998
ACM

Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity

13 years 8 months ago
Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity
Most databases contain “name constants” like course numbers, personal names, and place names that correspond to entities in the real world. Previous work in integration of heterogeneous databases has assumed that local name constants can be mapped into an appropriate global domain by normalization. However, in many cases, this assumption does not hold; determining if two name constants should be considered identical can require detailed knowledge of the world, the purpose of the user’s query, or both. In this paper, we reject the assumption that global domains can be easily constructed, and assume instead that the names are given in natural language text. We then propose a logic called WHIRL which reasons explicitly about the similarity of local names, as measured using the vector-space model commonly adopted in statistical information retrieval. We describe an efficient implementation of WHIRL and evaluate it experimentally on data extracted from the World Wide Web. We show tha...
William W. Cohen
Added 05 Aug 2010
Updated 05 Aug 2010
Type Conference
Year 1998
Where SIGMOD
Authors William W. Cohen
Comments (0)