We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tables are used to present, list, summarize, and structure important data in documents. In scholarly articles, they are often used to present the relationships among data and high...
We consider the problem of efficiently computing weighted proximity best-joins over multiple lists, with applications in information retrieval and extraction. We are given a multi-...
AnHai Doan, Haixun Wang, Hao He, Jun Yang 0001, Ri...
Echo videos are an important modality for cardiac decision support. In addition to describing the shape and motion of the heart, they capture important diagnostic measurements as ...
Tanveer Fathima Syeda-Mahmood, David Beymer, Arnon...
We describe an adaptive method for extracting records from web pages. Our algorithm combines a weighted tree matching metric with clustering for obtaining data extraction patterns...