Sciweavers

WEBI
2004
Springer

Semi-Structured Complex List Extraction

13 years 9 months ago
Semi-Structured Complex List Extraction
The semi-structured information available in HTML and similar documents provide valuable information that can be used for information extraction applications. This information together with other technical information about how to retrieve pages can be used to automatically extract pieces and various types of lists. The goal is to put as much intelligently as possible in the system so that as little knowledge and work as possible is required by the users, i.e. a user-driven extraction system. The advantage of a userdriven system is that the service provided by the system is available not only for experts, but for also ordinary users and thereby making the service available for a wide audience. A problem with some lists in documents are that the structure is different for the elements in the lists, and thus it becomes more difficult to take advantage of the semi-structural information. The agent-oriented system described in this paper allows a user without expert skills to train an ex...
Anders Arpteg
Added 02 Jul 2010
Updated 02 Jul 2010
Type Conference
Year 2004
Where WEBI
Authors Anders Arpteg
Comments (0)