Sciweavers

KDD
2004
ACM

Mining reference tables for automatic text segmentation

14 years 4 months ago
Mining reference tables for automatic text segmentation
Automatically segmenting unstructured text strings into structured records is necessary for importing the information contained in legacy sources and text collections into a data warehouse for subsequent querying, analysis, mining and integration. In this paper, we mine tables present in data warehouses and relational databases to develop an automatic segmentation system. Thus, we overcome limitations of existing supervised text segmentation approaches, which require comprehensive manually labeled training data. Our segmentation system is robust, accurate, and efficient, and requires no additional manual effort. Thorough evaluation on real datasets demonstrates the robustness and accuracy of our system, with segmentation accuracy exceeding state of the art supervised approaches. Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications--Data mining; I.2.6 [Artificial Intelligence]: Learning General Terms Algorithms, design, performance, experimentation Keyw...
Eugene Agichtein, Venkatesh Ganti
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2004
Where KDD
Authors Eugene Agichtein, Venkatesh Ganti
Comments (0)