We propose a novel extraction approach that exploits content redundancy on the web to extract structured data from template-based web sites. We start by populating a seed database...
Pankaj Gulhane, Rajeev Rastogi, Srinivasan H. Seng...
An increasing number of databases have become Web accessible through HTML form-based search interfaces. The data units returned from the underlying database are usually encoded in...
Yiyao Lu, Hai He, Hongkun Zhao, Weiyi Meng, Clemen...
This paper studies the problem of unified ranked retrieval of heterogeneous XML documents and Web data. We propose an effective search engine called Sailer to adaptively and versa...
Recent web search techniques augment traditional text matching with a global notion of "importance" based on the linkage structure of the web, such as in Google's P...
We introduce five methods for summarizing parts of Web pages on handheld devices, such as personal digital assistants (PDAs), or cellular phones. Each Web page is broken into text...
Orkut Buyukkokten, Hector Garcia-Molina, Andreas P...