The main goal for the Information Space system for TREC9 was early precision. To facilitate this, an emphasis was placed on seeking matches from only the TITLE, H1, H2 and H3 tags...
We present a system to automatically generate RSS feeds from HTML documents that consist of time-series items with date expressions, e.g., archives of weblogs, BBSs, chats, mailin...
Tools for mining information from data can create added value for the Internet. As the majority of electronic documents available over the network are in unstructured textual form...
The advance of the Web has significantly and rapidly changed the way of information organization, sharing and distribution. The next generation of the web, the semantic web, seeks...
This paper presents an approach for applying inductive logic programming to information extraction from HTML documents structured as unranked ordered trees. We consider information...