SQL Queries Over Unstructured Text Databases

14 years 5 months ago
SQL Queries Over Unstructured Text Databases
Text documents often embed data that is structured in nature. By processing a text database with information extraction systems, we can define a variety of structured "relations," over which we can then issue SQL queries. Processing SQL queries in this text-based scenario presents multiple challenges. One key challenge is efficiency: information extraction is a time-consuming process, so query processing strategies should pick efficient extraction systems whenever possible, and also minimize the number of documents that they process. Another key challenge is result quality: extraction systems might output erroneous information or miss information that they should capture; also, efficiencyrelated query processing decisions (e.g., to avoid processing large numbers of useless documents) may compromise result completeness. To address these challenges, we characterize SQL query processing strategies in terms of their efficiency and result quality, and discuss the (user-specific) ...
Alpa Jain, AnHai Doan, Luis Gravano
Added 01 Nov 2009
Updated 01 Nov 2009
Type Conference
Year 2007
Where ICDE
Authors Alpa Jain, AnHai Doan, Luis Gravano
Comments (0)