Finding related tables

10 years 10 days ago
Finding related tables
We consider the problem of finding related tables in a large corpus of heterogenous tables. Detecting related tables provides users a powerful tool for enhancing their tables with additional data and enables effective reuse of available public data. Our first contribution is a framework that captures several types of relatedness, including tables that are candidates for joins and tables that are candidates for union. Our second contribution is a set of algorithms for detecting related tables that can be either unioned or joined. We describe a set of experiments that demonstrate that our algorithms produce highly related tables. We also show that we can often improve the results of table search by pulling up tables that are ranked much lower based on their relatedness to top-ranked tables. Finally, we describe how to scale up our algorithms and show the results of running it on a corpus of over a million tables extracted from Wikipedia. Categories and Subject Descriptors H.0 [Inform...
Anish Das Sarma, Lujun Fang, Nitin Gupta 0003, Alo
Added 27 Sep 2012
Updated 27 Sep 2012
Type Journal
Year 2012
Authors Anish Das Sarma, Lujun Fang, Nitin Gupta 0003, Alon Y. Halevy, Hongrae Lee, Fei Wu 0003, Reynold Xin, Cong Yu
Comments (0)