Due to the historical and cultural reasons, English phases, especially the proper nouns and new words, frequently appear in Web pages written primarily in Asian languages such as ...
Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of <table> tags. A mul...
We present in this paper ObjectRunner, a system for extracting, integrating and querying structured data from the Web. Our system harvests real-world items from template-based HTM...
A web site should be easy to browse by visitors. However, sometimes the reality is quite different. Situations like several unrelated topics in a single web page may lead to confus...
Abstract. The Web now offers an exceptional infrastructure for the development of distributed collaborative services and applications. However, most of the existing applications on...