The Web is now a huge information repository with a rich semantic structure that, however, is primarily addressed to human understanding rather than automated processing by a compu...
Abstract. Recently it was shown that existing general-purpose inductive logic programming systems are useful for learning wrappers (known as L-wrappers) to extract data from HTML d...
This paper presents an approach for applying inductive logic programming to information extraction from HTML documents structured as unranked ordered trees. We consider information...
Existing methods of information extraction from HTML documents include manual approach, supervised learning and automatic techniques. The manual method has high precision and reca...
Mirel Cosulschi, Adrian Giurca, Bogdan Udrescu, Ni...
The main goal for the Information Space system for TREC9 was early precision. To facilitate this, an emphasis was placed on seeking matches from only the TITLE, H1, H2 and H3 tags...