Several Web sites deliver a large number of pages, each publishing data about one instance of some real world entity, such as an athlete, a stock quote, a book. Even though it is ...
Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, ...
The World-Wide Web consists not only of a huge number of unstructured texts, but also a vast amount of valuable structured data. Web tables [2] are a typical type of structured in...
Cindy Xide Lin, Bo Zhao, Tim Weninger, Jiawei Han,...
In this poster, we present an information extraction engine for web-based forums. The engine analyzes the HTML files crawled from web forums, deduces the wrapper (template) of the...
Hanny Yulius Limanto, Nguyen Ngoc Giang, Vo Tan Tr...
In this paper, we describe a methodology to estimate the geographic coverage of the web without the need for secondary knowledge or complex geo-tagging. This is achieved by random...
Robert Pasley, Paul Clough, Ross S. Purves, Floria...
This paper presents a method for generating indexable and browsable keyword metadata from ASR transcripts by leveraging the Web. Search engine queries are built from an ASR transc...
Kishan Thambiratnam, Gang Li, Sha Meng, Frank Seid...