In order for agents to act on behalf of users, they will have to retrieve and integrate vast amounts of textual data on the World Wide Web. However, much of the useful data on the...
Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to bo...
Jiang-Ming Yang, Rui Cai, Yida Wang, Jun Zhu, Lei ...
Information in today’s enterprises commonly resides in a variety of heterogeneous data sources, including relational databases, web services, files, packaged applications, and c...
In this paper, we consider the problem of extracting structured data from web pages taking into account both the content of individual attributes as well as the structure of pages...
The approach presented in this paper is intended for the semi-automatic construction of a learning object repository from HTML pages. An extraction method consists of applying the...